[jira] [Created] (BEAM-12506) Change WindowedValueHolder into a Row Schema
Ning Kang created BEAM-12506: Summary: Change WindowedValueHolder into a Row Schema Key: BEAM-12506 URL: https://issues.apache.org/jira/browse/BEAM-12506 Project: Beam Issue Type: Improvement Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang The WindowedValueHolder is a Python type that requires a special `SafeFastPrimitivesCoder` instead of the native `FastPrimitivesCoder` in cache_manager to encode and decode. When reading cache of it and applying an external transform such as a SqlTransform, it introduces a pickled Python coder that is not xLang friendly. We could build a Row schema to hold the WindowedValueHolder to make the cache reading xLang friendly and also get rid of the additional layer of `SafeFastPrimitivesCoder`. We could -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10708 started by Ning Kang. > InteractiveRunner cannot execute pipeline with cross-language transform > --- > > Key: BEAM-10708 > URL: https://issues.apache.org/jira/browse/BEAM-10708 > Project: Beam > Issue Type: Bug > Components: cross-language >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P2 > Time Spent: 19h 50m > Remaining Estimate: 0h > > The InteractiveRunner crashes when given a pipeline that includes a > cross-language transform. > Here's the example I tried to run in a jupyter notebook: > {code:python} > p = beam.Pipeline(InteractiveRunner()) > pc = (p | SqlTransform("""SELECT > CAST(1 AS INT) AS `id`, > CAST('foo' AS VARCHAR) AS `str`, > CAST(3.14 AS DOUBLE) AS `flt`""")) > df = interactive_beam.collect(pc) > {code} > The problem occurs when > [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66] > creates a copy of the pipeline by [writing it to proto and reading it > back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120]. > Reading it back fails because some of the pipeline is not written in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-10708: Assignee: Ning Kang > InteractiveRunner cannot execute pipeline with cross-language transform > --- > > Key: BEAM-10708 > URL: https://issues.apache.org/jira/browse/BEAM-10708 > Project: Beam > Issue Type: Bug > Components: cross-language >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P2 > Time Spent: 19h 50m > Remaining Estimate: 0h > > The InteractiveRunner crashes when given a pipeline that includes a > cross-language transform. > Here's the example I tried to run in a jupyter notebook: > {code:python} > p = beam.Pipeline(InteractiveRunner()) > pc = (p | SqlTransform("""SELECT > CAST(1 AS INT) AS `id`, > CAST('foo' AS VARCHAR) AS `str`, > CAST(3.14 AS DOUBLE) AS `flt`""")) > df = interactive_beam.collect(pc) > {code} > The problem occurs when > [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66] > creates a copy of the pipeline by [writing it to proto and reading it > back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120]. > Reading it back fails because some of the pipeline is not written in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-12391) WriteToAvro fails if fastavro loads its python implementation of writer
[ https://issues.apache.org/jira/browse/BEAM-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352650#comment-17352650 ] Ning Kang edited comment on BEAM-12391 at 5/27/21, 5:27 PM: Beam notebooks have mitigated the issue by setting an environment variable {code:json} {"env": {"LD_LIBRARY_PATH":"/opt/conda/lib"}} {code} in each IPython kernel. was (Author: ningk): Beam notebooks have mitigated the issue by setting an environment variable `"env": {"LD_LIBRARY_PATH":"/opt/conda/lib"}` in each IPython kernel. > WriteToAvro fails if fastavro loads its python implementation of writer > --- > > Key: BEAM-12391 > URL: https://issues.apache.org/jira/browse/BEAM-12391 > Project: Beam > Issue Type: Bug > Components: io-py-avro >Affects Versions: 2.25.0, 2.26.0, 2.27.0, 2.28.0, 2.29.0 >Reporter: Chris Chandler >Priority: P2 > > It's possible for fastavro to fail to correctly load its cython > implementation of the Writer class in which case it will silently fall back > to a pure python implementation. If this happens there's no outward > indication, but line 621 in io/avroio.py will fail because writer.fo is only > present on the cython implementation. > To reproduce you can modify fastavro's write.py to only use its fallback: > {code} > #from . import _write > from . import _write_py as _write > {code} > And then run a workflow that sinks to WriteToAvro(use_fastavro=True). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-12391) WriteToAvro fails if fastavro loads its python implementation of writer
[ https://issues.apache.org/jira/browse/BEAM-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352650#comment-17352650 ] Ning Kang commented on BEAM-12391: -- Beam notebooks have mitigated the issue by setting an environment variable `"env": {"LD_LIBRARY_PATH":"/opt/conda/lib"}` in each IPython kernel. > WriteToAvro fails if fastavro loads its python implementation of writer > --- > > Key: BEAM-12391 > URL: https://issues.apache.org/jira/browse/BEAM-12391 > Project: Beam > Issue Type: Bug > Components: io-py-avro >Affects Versions: 2.25.0, 2.26.0, 2.27.0, 2.28.0, 2.29.0 >Reporter: Chris Chandler >Priority: P2 > > It's possible for fastavro to fail to correctly load its cython > implementation of the Writer class in which case it will silently fall back > to a pure python implementation. If this happens there's no outward > indication, but line 621 in io/avroio.py will fail because writer.fo is only > present on the cython implementation. > To reproduce you can modify fastavro's write.py to only use its fallback: > {code} > #from . import _write > from . import _write_py as _write > {code} > And then run a workflow that sinks to WriteToAvro(use_fastavro=True). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-12391) WriteToAvro fails if fastavro loads its python implementation of writer
[ https://issues.apache.org/jira/browse/BEAM-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350620#comment-17350620 ] Ning Kang commented on BEAM-12391: -- Thanks for letting me know, I've taken the external issue. > WriteToAvro fails if fastavro loads its python implementation of writer > --- > > Key: BEAM-12391 > URL: https://issues.apache.org/jira/browse/BEAM-12391 > Project: Beam > Issue Type: Bug > Components: io-py-avro >Affects Versions: 2.25.0, 2.26.0, 2.27.0, 2.28.0, 2.29.0 >Reporter: Chris Chandler >Priority: P2 > > It's possible for fastavro to fail to correctly load its cython > implementation of the Writer class in which case it will silently fall back > to a pure python implementation. If this happens there's no outward > indication, but line 621 in io/avroio.py will fail because writer.fo is only > present on the cython implementation. > To reproduce you can modify fastavro's write.py to only use its fallback: > {code} > #from . import _write > from . import _write_py as _write > {code} > And then run a workflow that sinks to WriteToAvro(use_fastavro=True). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347084#comment-17347084 ] Ning Kang commented on BEAM-10708: -- [~chamikara] Thanks for asking, the first PR#14368 that initiated a new augmented_pipeline module to replace pipeline_instrument has been merged. I'll be working on this after finishing some other higher priority tasks. There are some known users who depend on the old modules that uses runner api roundtrips, I'll work with them to roll out the new module. We might need to keep the old modules as deprecated for a while. > InteractiveRunner cannot execute pipeline with cross-language transform > --- > > Key: BEAM-10708 > URL: https://issues.apache.org/jira/browse/BEAM-10708 > Project: Beam > Issue Type: Bug > Components: cross-language >Reporter: Brian Hulette >Priority: P2 > Time Spent: 19h 50m > Remaining Estimate: 0h > > The InteractiveRunner crashes when given a pipeline that includes a > cross-language transform. > Here's the example I tried to run in a jupyter notebook: > {code:python} > p = beam.Pipeline(InteractiveRunner()) > pc = (p | SqlTransform("""SELECT > CAST(1 AS INT) AS `id`, > CAST('foo' AS VARCHAR) AS `str`, > CAST(3.14 AS DOUBLE) AS `flt`""")) > df = interactive_beam.collect(pc) > {code} > The problem occurs when > [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66] > creates a copy of the pipeline by [writing it to proto and reading it > back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120]. > Reading it back fails because some of the pipeline is not written in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-12096) Flake: test_progress_in_HTML_JS_when_in_notebook
[ https://issues.apache.org/jira/browse/BEAM-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-12096: - Fix Version/s: Not applicable Resolution: Fixed Status: Resolved (was: In Progress) > Flake: test_progress_in_HTML_JS_when_in_notebook > > > Key: BEAM-12096 > URL: https://issues.apache.org/jira/browse/BEAM-12096 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Ning Kang >Priority: P1 > Labels: flake > Fix For: Not applicable > > Time Spent: 3h 50m > Remaining Estimate: 0h > > * > https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/ > self = testMethod=test_progress_in_HTML_JS_when_in_notebook> def > test_progress_in_HTML_JS_when_in_notebook(self): > ie.current_env()._is_in_notebook = True pi_path = > 'apache_beam.runners.interactive.utils.ProgressIndicator' with > patch('IPython.core.display.HTML') as mocked_html, \ > patch('IPython.core.display.Javascript') as mocked_javascript, \ > patch(pi_path + '.spinner_template') as enter_template, \ patch(pi_path + > '.spinner_removal_template') as exit_template: > enter_template.format.return_value = 'enter' > exit_template.format.return_value = 'exit' @utils.progress_indicated def > progress_indicated_dummy(): mocked_html.assert_any_call('enter') > > progress_indicated_dummy() apache_beam/runners/interactive/utils_test.py:217: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ apache_beam/runners/interactive/utils.py:235: in > run_within_progress_indicator return func(*args, **kwargs) > apache_beam/runners/interactive/utils_test.py:215: in > progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = > , args = ('enter',) kwargs = {}, > expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...] > >You have limited Interactive Beam features since your ipython kernel is not > >connected any notebook frontend.')] cause = None, expected_string = > >"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the > >mock has been called with the specified arguments. The assert passes if the > >mock has *ever* been called, unlike `assert_called_with` and > >`assert_called_once_with` that only pass if the call is the most recent > >one.""" expected = self._call_matcher((args, kwargs)) actual = > >[self._call_matcher(c) for c in self.call_args_list] if expected not in > >actual: cause = expected if isinstance(expected, Exception) else None > >expected_string = self._format_mock_call_signature(args, kwargs) raise > >AssertionError( '%s call not found' % expected_string > ) from cause E > >AssertionError: HTML('enter') call not found > >/usr/lib/python3.7/unittest/mock.py:891: AssertionError -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-12165) ParquetIO sink should allow to pass an Avro data model
[ https://issues.apache.org/jira/browse/BEAM-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-12165: - Issue Type: Bug (was: Improvement) > ParquetIO sink should allow to pass an Avro data model > -- > > Key: BEAM-12165 > URL: https://issues.apache.org/jira/browse/BEAM-12165 > Project: Beam > Issue Type: Bug > Components: io-java-parquet >Reporter: Ning Kang >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > AvroParquetWriter instantiated in ParquetIO [1] does not specify the data > model. > The default is SpecificData model [2], while the AvroParquetReader is reading > with a GenericData model [3]. > ParquetIO should pass in the correct data model. > [1] > https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052 > [2] > https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163 > [3] > https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10901) Flaky test: PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
[ https://issues.apache.org/jira/browse/BEAM-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10901: - Resolution: Fixed Status: Resolved (was: Open) Mark this as a duplicate. > Flaky test: > PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection > --- > > Key: BEAM-10901 > URL: https://issues.apache.org/jira/browse/BEAM-10901 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Ning Kang >Priority: P1 > Labels: flake, flaky-test > > https://pipelines.actions.githubusercontent.com/COKSSHEuHyOLM1pP93G8aBnLpu5tMJfYiveSzILlm0cIfDvSEl/_apis/pipelines/1/runs/6628/signedlogcontent/15?urlExpires=2020-09-15T18%3A07%3A33.5773511Z&urlSigningMethod=HMACV1&urlSignature=OWYnI0Ba0e%2FDh2cPSD7%2BFoYc6dp9%2F%2BLJwAMGszdaO1M%3D > {{2020-09-15T17:47:23.9124315Z _ > PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection > _ > 2020-09-15T17:47:23.9127327Z [gw2] win32 -- Python 3.5.4 > d:\a\beam\beam\sdks\python\target\.tox\py35-win\scripts\python.exe > 2020-09-15T17:47:23.9128518Z > 2020-09-15T17:47:23.9134948Z self = > testMethod=test_able_to_cache_intermediate_unbounded_source_pcollection> > 2020-09-15T17:47:23.9136122Z > 2020-09-15T17:47:23.9136585Z def setUp(self): > 2020-09-15T17:47:23.9137012Z > ie.new_env() > 2020-09-15T17:47:23.9137348Z > 2020-09-15T17:47:23.9138982Z > apache_beam\runners\interactive\pipeline_instrument_test.py:46: > 2020-09-15T17:47:23.9139662Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > 2020-09-15T17:47:23.9141259Z > apache_beam\runners\interactive\interactive_environment.py:117: in new_env > 2020-09-15T17:47:23.9141999Z _interactive_beam_env.cleanup() > 2020-09-15T17:47:23.9142755Z > apache_beam\runners\interactive\interactive_environment.py:273: in cleanup > 2020-09-15T17:47:23.9144712Z cache_manager.cleanup() > 2020-09-15T17:47:23.9145453Z > apache_beam\runners\interactive\caching\streaming_cache.py:391: in cleanup > 2020-09-15T17:47:23.9147098Z shutil.rmtree(self._cache_dir) > 2020-09-15T17:47:23.9148525Z > c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:494: in rmtree > 2020-09-15T17:47:23.9149356Z return _rmtree_unsafe(path, onerror) > 2020-09-15T17:47:23.9150431Z > c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:384: in > _rmtree_unsafe > 2020-09-15T17:47:23.9151289Z _rmtree_unsafe(fullname, onerror) > 2020-09-15T17:47:23.9152010Z > c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:389: in > _rmtree_unsafe > 2020-09-15T17:47:23.9152756Z onerror(os.unlink, fullname, sys.exc_info()) > 2020-09-15T17:47:23.9153286Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > 2020-09-15T17:47:23.9153756Z > 2020-09-15T17:47:23.9155669Z path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py35-win\\tmp\\it-87ky_iwj2483994137376\\full' > 2020-09-15T17:47:23.9156453Z onerror = .onerror at > 0x024259F42268> > 2020-09-15T17:47:23.9156951Z > 2020-09-15T17:47:23.9157456Z def _rmtree_unsafe(path, onerror): > 2020-09-15T17:47:23.9157949Z try: > 2020-09-15T17:47:23.9158409Z if os.path.islink(path): > 2020-09-15T17:47:23.9159821Z # symlinks to directories are > forbidden, see bug #1669 > 2020-09-15T17:47:23.9161488Z raise OSError("Cannot call > rmtree on a symbolic link") > 2020-09-15T17:47:23.9162128Z except OSError: > 2020-09-15T17:47:23.9162713Z onerror(os.path.islink, path, > sys.exc_info()) > 2020-09-15T17:47:23.9163363Z # can't continue even if onerror > hook returns > 2020-09-15T17:47:23.9163870Z return > 2020-09-15T17:47:23.9164305Z names = [] > 2020-09-15T17:47:23.9164715Z try: > 2020-09-15T17:47:23.9165740Z names = os.listdir(path) > 2020-09-15T17:47:23.9166181Z except OSError: > 2020-09-15T17:47:23.9166737Z onerror(os.listdir, path, > sys.exc_info()) > 2020-09-15T17:47:23.9167277Z for name in names: > 2020-09-15T17:47:23.9167796Z fullname = os.path.join(path, name) > 2020-09-15T17:47:23.9168270Z try: > 2020-09-15T17:47:23.9168993Z mode = os.lstat(fullname).st_mode > 2020-09-15T17:47:23.9169511Z except OSError: > 2020-09-15T17:47:23.9169942Z mode = 0 > 2020-09-15T17:47:23.9170392Z if stat.S_ISDIR(mode): > 2020-09-15T17:47:23.9170991Z _rmtree_unsafe(fullname, onerror) > 2020-09-15T17:47:23.9171493Z else: > 2020-09-15T17:47:23.9171895Z try: > 2020-09-15T17:47:23.9172276Z
[jira] [Commented] (BEAM-10901) Flaky test: PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
[ https://issues.apache.org/jira/browse/BEAM-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324133#comment-17324133 ] Ning Kang commented on BEAM-10901: -- [~bhulette] Yes, the root cause should be the same. Let's mark it a duplicate. > Flaky test: > PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection > --- > > Key: BEAM-10901 > URL: https://issues.apache.org/jira/browse/BEAM-10901 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Ning Kang >Priority: P1 > Labels: flake, flaky-test > > https://pipelines.actions.githubusercontent.com/COKSSHEuHyOLM1pP93G8aBnLpu5tMJfYiveSzILlm0cIfDvSEl/_apis/pipelines/1/runs/6628/signedlogcontent/15?urlExpires=2020-09-15T18%3A07%3A33.5773511Z&urlSigningMethod=HMACV1&urlSignature=OWYnI0Ba0e%2FDh2cPSD7%2BFoYc6dp9%2F%2BLJwAMGszdaO1M%3D > {{2020-09-15T17:47:23.9124315Z _ > PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection > _ > 2020-09-15T17:47:23.9127327Z [gw2] win32 -- Python 3.5.4 > d:\a\beam\beam\sdks\python\target\.tox\py35-win\scripts\python.exe > 2020-09-15T17:47:23.9128518Z > 2020-09-15T17:47:23.9134948Z self = > testMethod=test_able_to_cache_intermediate_unbounded_source_pcollection> > 2020-09-15T17:47:23.9136122Z > 2020-09-15T17:47:23.9136585Z def setUp(self): > 2020-09-15T17:47:23.9137012Z > ie.new_env() > 2020-09-15T17:47:23.9137348Z > 2020-09-15T17:47:23.9138982Z > apache_beam\runners\interactive\pipeline_instrument_test.py:46: > 2020-09-15T17:47:23.9139662Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > 2020-09-15T17:47:23.9141259Z > apache_beam\runners\interactive\interactive_environment.py:117: in new_env > 2020-09-15T17:47:23.9141999Z _interactive_beam_env.cleanup() > 2020-09-15T17:47:23.9142755Z > apache_beam\runners\interactive\interactive_environment.py:273: in cleanup > 2020-09-15T17:47:23.9144712Z cache_manager.cleanup() > 2020-09-15T17:47:23.9145453Z > apache_beam\runners\interactive\caching\streaming_cache.py:391: in cleanup > 2020-09-15T17:47:23.9147098Z shutil.rmtree(self._cache_dir) > 2020-09-15T17:47:23.9148525Z > c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:494: in rmtree > 2020-09-15T17:47:23.9149356Z return _rmtree_unsafe(path, onerror) > 2020-09-15T17:47:23.9150431Z > c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:384: in > _rmtree_unsafe > 2020-09-15T17:47:23.9151289Z _rmtree_unsafe(fullname, onerror) > 2020-09-15T17:47:23.9152010Z > c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:389: in > _rmtree_unsafe > 2020-09-15T17:47:23.9152756Z onerror(os.unlink, fullname, sys.exc_info()) > 2020-09-15T17:47:23.9153286Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > 2020-09-15T17:47:23.9153756Z > 2020-09-15T17:47:23.9155669Z path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py35-win\\tmp\\it-87ky_iwj2483994137376\\full' > 2020-09-15T17:47:23.9156453Z onerror = .onerror at > 0x024259F42268> > 2020-09-15T17:47:23.9156951Z > 2020-09-15T17:47:23.9157456Z def _rmtree_unsafe(path, onerror): > 2020-09-15T17:47:23.9157949Z try: > 2020-09-15T17:47:23.9158409Z if os.path.islink(path): > 2020-09-15T17:47:23.9159821Z # symlinks to directories are > forbidden, see bug #1669 > 2020-09-15T17:47:23.9161488Z raise OSError("Cannot call > rmtree on a symbolic link") > 2020-09-15T17:47:23.9162128Z except OSError: > 2020-09-15T17:47:23.9162713Z onerror(os.path.islink, path, > sys.exc_info()) > 2020-09-15T17:47:23.9163363Z # can't continue even if onerror > hook returns > 2020-09-15T17:47:23.9163870Z return > 2020-09-15T17:47:23.9164305Z names = [] > 2020-09-15T17:47:23.9164715Z try: > 2020-09-15T17:47:23.9165740Z names = os.listdir(path) > 2020-09-15T17:47:23.9166181Z except OSError: > 2020-09-15T17:47:23.9166737Z onerror(os.listdir, path, > sys.exc_info()) > 2020-09-15T17:47:23.9167277Z for name in names: > 2020-09-15T17:47:23.9167796Z fullname = os.path.join(path, name) > 2020-09-15T17:47:23.9168270Z try: > 2020-09-15T17:47:23.9168993Z mode = os.lstat(fullname).st_mode > 2020-09-15T17:47:23.9169511Z except OSError: > 2020-09-15T17:47:23.9169942Z mode = 0 > 2020-09-15T17:47:23.9170392Z if stat.S_ISDIR(mode): > 2020-09-15T17:47:23.9170991Z _rmtree_unsafe(fullname, onerror) > 2020-09-15T17:47:23.9171493Z else: > 2020-09-15T17:47:23.9171895Z
[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows
[ https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324074#comment-17324074 ] Ning Kang commented on BEAM-12178: -- PR#14558 merged. Let's keep the ticket open for a while and see if this issue re-occurs. > ReadCacheTest flakes on Windows > --- > > Key: BEAM-12178 > URL: https://issues.apache.org/jira/browse/BEAM-12178 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing, flake > Time Spent: 2h 50m > Remaining Estimate: 0h > > Example failure: > https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757 > {code} > ReadCacheTest.test_read_cache > > [gw4] win32 -- Python 3.6.8 > d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe > self = testMethod=test_read_cache> > def setUp(self): > > ie.new_env() > apache_beam\runners\interactive\caching\read_cache_test.py:37: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam\runners\interactive\interactive_environment.py:119: in new_env > _interactive_beam_env.cleanup() > apache_beam\runners\interactive\interactive_environment.py:269: in cleanup > self.evict_recording_manager(pipeline) > apache_beam\runners\interactive\interactive_environment.py:396: in > evict_recording_manager > rm.clear() > apache_beam\runners\interactive\recording_manager.py:339: in clear > cache_manager.cleanup() > apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup > shutil.rmtree(self._cache_dir) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree > return _rmtree_unsafe(path, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in > _rmtree_unsafe > _rmtree_unsafe(fullname, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in > _rmtree_unsafe > onerror(os.unlink, fullname, sys.exc_info()) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full' > onerror = .onerror at 0x0244BA770A60> > def _rmtree_unsafe(path, onerror): > try: > if os.path.islink(path): > # symlinks to directories are forbidden, see bug #1669 > raise OSError("Cannot call rmtree on a symbolic link") > except OSError: > onerror(os.path.islink, path, sys.exc_info()) > # can't continue even if onerror hook returns > return > names = [] > try: > names = os.listdir(path) > except OSError: > onerror(os.listdir, path, sys.exc_info()) > for name in names: > fullname = os.path.join(path, name) > try: > mode = os.lstat(fullname).st_mode > except OSError: > mode = 0 > if stat.S_ISDIR(mode): > _rmtree_unsafe(fullname, onerror) > else: > try: > > os.unlink(fullname) > E PermissionError: [WinError 32] The process cannot access > the file because it is being used by another process: > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-12179) PortableRunnerTestWithExternalEnv failing
[ https://issues.apache.org/jira/browse/BEAM-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-12179: - Description: The test seems to be failing on Ubuntu platform for multiple PRs. An [example|https://github.com/apache/beam/pull/14558/checks?check_run_id=2358281557]. A similar issue in the past is BEAM-8646. {code} __ PortableRunnerTestWithExternalEnv.test_assert_that __ [gw1] linux -- Python 3.8.9 /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python self = def test_assert_that(self): # TODO: figure out a way for fn_api_runner to parse and raise the # underlying exception. with self.assertRaisesRegex(Exception, 'Failed assert'): with self.create_pipeline() as p: > assert_that(p | beam.Create(['a', 'b']), equal_to(['a'])) E AssertionError: "Failed assert" does not match "<_MultiThreadedRendezvous of RPC that terminated with: E status = StatusCode.DEADLINE_EXCEEDED E details = "Deadline Exceeded" E debug_error_string = "{"created":"@1618536797.894329055","description":"Error received from peer ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline Exceeded","grpc_status":4}" E >" apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: AssertionError {code} was: The test seems to be failing on Ubuntu platform for multiple PRs. A similar issue in the past is BEAM-8646. {code} __ PortableRunnerTestWithExternalEnv.test_assert_that __ [gw1] linux -- Python 3.8.9 /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python self = def test_assert_that(self): # TODO: figure out a way for fn_api_runner to parse and raise the # underlying exception. with self.assertRaisesRegex(Exception, 'Failed assert'): with self.create_pipeline() as p: > assert_that(p | beam.Create(['a', 'b']), equal_to(['a'])) E AssertionError: "Failed assert" does not match "<_MultiThreadedRendezvous of RPC that terminated with: E status = StatusCode.DEADLINE_EXCEEDED E details = "Deadline Exceeded" E debug_error_string = "{"created":"@1618536797.894329055","description":"Error received from peer ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline Exceeded","grpc_status":4}" E >" apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: AssertionError {code} > PortableRunnerTestWithExternalEnv failing > - > > Key: BEAM-12179 > URL: https://issues.apache.org/jira/browse/BEAM-12179 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Ning Kang >Priority: P2 > > The test seems to be failing on Ubuntu platform for multiple PRs. An > [example|https://github.com/apache/beam/pull/14558/checks?check_run_id=2358281557]. > A similar issue in the past is BEAM-8646. > {code} > __ PortableRunnerTestWithExternalEnv.test_assert_that > __ > [gw1] linux -- Python 3.8.9 > /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python > self = > testMethod=test_assert_that> > def test_assert_that(self): > # TODO: figure out a way for fn_api_runner to parse and raise the > # underlying exception. > with self.assertRaisesRegex(Exception, 'Failed assert'): > with self.create_pipeline() as p: > > assert_that(p | beam.Create(['a', 'b']), equal_to(['a'])) > E AssertionError: "Failed assert" does not match > "<_MultiThreadedRendezvous of RPC that terminated with: > E status = StatusCode.DEADLINE_EXCEEDED > E details = "Deadline Exceeded" > E debug_error_string = > "{"created":"@1618536797.894329055","description":"Error received from peer > ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline > Exceeded","grpc_status":4}" > E >" > apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: > AssertionError > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-12179) PortableRunnerTestWithExternalEnv failing
[ https://issues.apache.org/jira/browse/BEAM-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-12179: - Issue Type: Bug (was: Improvement) > PortableRunnerTestWithExternalEnv failing > - > > Key: BEAM-12179 > URL: https://issues.apache.org/jira/browse/BEAM-12179 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ning Kang >Priority: P2 > > The test seems to be failing on Ubuntu platform for multiple PRs. An > [example|https://github.com/apache/beam/pull/14558/checks?check_run_id=2358281557]. > A similar issue in the past is BEAM-8646. > {code} > __ PortableRunnerTestWithExternalEnv.test_assert_that > __ > [gw1] linux -- Python 3.8.9 > /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python > self = > testMethod=test_assert_that> > def test_assert_that(self): > # TODO: figure out a way for fn_api_runner to parse and raise the > # underlying exception. > with self.assertRaisesRegex(Exception, 'Failed assert'): > with self.create_pipeline() as p: > > assert_that(p | beam.Create(['a', 'b']), equal_to(['a'])) > E AssertionError: "Failed assert" does not match > "<_MultiThreadedRendezvous of RPC that terminated with: > E status = StatusCode.DEADLINE_EXCEEDED > E details = "Deadline Exceeded" > E debug_error_string = > "{"created":"@1618536797.894329055","description":"Error received from peer > ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline > Exceeded","grpc_status":4}" > E >" > apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: > AssertionError > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-12179) PortableRunnerTestWithExternalEnv failing
Ning Kang created BEAM-12179: Summary: PortableRunnerTestWithExternalEnv failing Key: BEAM-12179 URL: https://issues.apache.org/jira/browse/BEAM-12179 Project: Beam Issue Type: Improvement Components: test-failures Reporter: Ning Kang The test seems to be failing on Ubuntu platform for multiple PRs. A similar issue in the past is BEAM-8646. {code} __ PortableRunnerTestWithExternalEnv.test_assert_that __ [gw1] linux -- Python 3.8.9 /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python self = def test_assert_that(self): # TODO: figure out a way for fn_api_runner to parse and raise the # underlying exception. with self.assertRaisesRegex(Exception, 'Failed assert'): with self.create_pipeline() as p: > assert_that(p | beam.Create(['a', 'b']), equal_to(['a'])) E AssertionError: "Failed assert" does not match "<_MultiThreadedRendezvous of RPC that terminated with: E status = StatusCode.DEADLINE_EXCEEDED E details = "Deadline Exceeded" E debug_error_string = "{"created":"@1618536797.894329055","description":"Error received from peer ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline Exceeded","grpc_status":4}" E >" apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: AssertionError {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows
[ https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322529#comment-17322529 ] Ning Kang commented on BEAM-12178: -- I searched the whole code base, there are only 3 occurrences of `ie.current_env().set_recording_manager`: https://github.com/apache/beam/search?q=set_recording_manager One of them started a background recording job and added that recording manager to the global instance ie.current_env(). I'll remove that occurrence. > ReadCacheTest flakes on Windows > --- > > Key: BEAM-12178 > URL: https://issues.apache.org/jira/browse/BEAM-12178 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing, flake > > Example failure: > https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757 > {code} > ReadCacheTest.test_read_cache > > [gw4] win32 -- Python 3.6.8 > d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe > self = testMethod=test_read_cache> > def setUp(self): > > ie.new_env() > apache_beam\runners\interactive\caching\read_cache_test.py:37: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam\runners\interactive\interactive_environment.py:119: in new_env > _interactive_beam_env.cleanup() > apache_beam\runners\interactive\interactive_environment.py:269: in cleanup > self.evict_recording_manager(pipeline) > apache_beam\runners\interactive\interactive_environment.py:396: in > evict_recording_manager > rm.clear() > apache_beam\runners\interactive\recording_manager.py:339: in clear > cache_manager.cleanup() > apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup > shutil.rmtree(self._cache_dir) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree > return _rmtree_unsafe(path, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in > _rmtree_unsafe > _rmtree_unsafe(fullname, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in > _rmtree_unsafe > onerror(os.unlink, fullname, sys.exc_info()) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full' > onerror = .onerror at 0x0244BA770A60> > def _rmtree_unsafe(path, onerror): > try: > if os.path.islink(path): > # symlinks to directories are forbidden, see bug #1669 > raise OSError("Cannot call rmtree on a symbolic link") > except OSError: > onerror(os.path.islink, path, sys.exc_info()) > # can't continue even if onerror hook returns > return > names = [] > try: > names = os.listdir(path) > except OSError: > onerror(os.listdir, path, sys.exc_info()) > for name in names: > fullname = os.path.join(path, name) > try: > mode = os.lstat(fullname).st_mode > except OSError: > mode = 0 > if stat.S_ISDIR(mode): > _rmtree_unsafe(fullname, onerror) > else: > try: > > os.unlink(fullname) > E PermissionError: [WinError 32] The process cannot access > the file because it is being used by another process: > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows
[ https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322524#comment-17322524 ] Ning Kang commented on BEAM-12178: -- I think the root cause is in windows, shutil.rmtree cannot delete a directory with files that are open by other processes. In StreamingCache, there is a place where "open"/"close" a file is decoupled: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py#L116. This test is not flaky. The flakiness is the cleanup of the current interactive environment after last test execution. Some of the opened file handlers are not closed. I'll do some further investigation. > ReadCacheTest flakes on Windows > --- > > Key: BEAM-12178 > URL: https://issues.apache.org/jira/browse/BEAM-12178 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing, flake > > Example failure: > https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757 > {code} > ReadCacheTest.test_read_cache > > [gw4] win32 -- Python 3.6.8 > d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe > self = testMethod=test_read_cache> > def setUp(self): > > ie.new_env() > apache_beam\runners\interactive\caching\read_cache_test.py:37: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam\runners\interactive\interactive_environment.py:119: in new_env > _interactive_beam_env.cleanup() > apache_beam\runners\interactive\interactive_environment.py:269: in cleanup > self.evict_recording_manager(pipeline) > apache_beam\runners\interactive\interactive_environment.py:396: in > evict_recording_manager > rm.clear() > apache_beam\runners\interactive\recording_manager.py:339: in clear > cache_manager.cleanup() > apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup > shutil.rmtree(self._cache_dir) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree > return _rmtree_unsafe(path, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in > _rmtree_unsafe > _rmtree_unsafe(fullname, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in > _rmtree_unsafe > onerror(os.unlink, fullname, sys.exc_info()) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full' > onerror = .onerror at 0x0244BA770A60> > def _rmtree_unsafe(path, onerror): > try: > if os.path.islink(path): > # symlinks to directories are forbidden, see bug #1669 > raise OSError("Cannot call rmtree on a symbolic link") > except OSError: > onerror(os.path.islink, path, sys.exc_info()) > # can't continue even if onerror hook returns > return > names = [] > try: > names = os.listdir(path) > except OSError: > onerror(os.listdir, path, sys.exc_info()) > for name in names: > fullname = os.path.join(path, name) > try: > mode = os.lstat(fullname).st_mode > except OSError: > mode = 0 > if stat.S_ISDIR(mode): > _rmtree_unsafe(fullname, onerror) > else: > try: > > os.unlink(fullname) > E PermissionError: [WinError 32] The process cannot access > the file because it is being used by another process: > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-12178) ReadCacheTest flakes on Windows
[ https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-12178 started by Ning Kang. > ReadCacheTest flakes on Windows > --- > > Key: BEAM-12178 > URL: https://issues.apache.org/jira/browse/BEAM-12178 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing, flake > > Example failure: > https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757 > {code} > ReadCacheTest.test_read_cache > > [gw4] win32 -- Python 3.6.8 > d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe > self = testMethod=test_read_cache> > def setUp(self): > > ie.new_env() > apache_beam\runners\interactive\caching\read_cache_test.py:37: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam\runners\interactive\interactive_environment.py:119: in new_env > _interactive_beam_env.cleanup() > apache_beam\runners\interactive\interactive_environment.py:269: in cleanup > self.evict_recording_manager(pipeline) > apache_beam\runners\interactive\interactive_environment.py:396: in > evict_recording_manager > rm.clear() > apache_beam\runners\interactive\recording_manager.py:339: in clear > cache_manager.cleanup() > apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup > shutil.rmtree(self._cache_dir) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree > return _rmtree_unsafe(path, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in > _rmtree_unsafe > _rmtree_unsafe(fullname, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in > _rmtree_unsafe > onerror(os.unlink, fullname, sys.exc_info()) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full' > onerror = .onerror at 0x0244BA770A60> > def _rmtree_unsafe(path, onerror): > try: > if os.path.islink(path): > # symlinks to directories are forbidden, see bug #1669 > raise OSError("Cannot call rmtree on a symbolic link") > except OSError: > onerror(os.path.islink, path, sys.exc_info()) > # can't continue even if onerror hook returns > return > names = [] > try: > names = os.listdir(path) > except OSError: > onerror(os.listdir, path, sys.exc_info()) > for name in names: > fullname = os.path.join(path, name) > try: > mode = os.lstat(fullname).st_mode > except OSError: > mode = 0 > if stat.S_ISDIR(mode): > _rmtree_unsafe(fullname, onerror) > else: > try: > > os.unlink(fullname) > E PermissionError: [WinError 32] The process cannot access > the file because it is being used by another process: > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-12178) ReadCacheTest flakes on Windows
[ https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-12178: Assignee: Ning Kang > ReadCacheTest flakes on Windows > --- > > Key: BEAM-12178 > URL: https://issues.apache.org/jira/browse/BEAM-12178 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing, flake > > Example failure: > https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757 > {code} > ReadCacheTest.test_read_cache > > [gw4] win32 -- Python 3.6.8 > d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe > self = testMethod=test_read_cache> > def setUp(self): > > ie.new_env() > apache_beam\runners\interactive\caching\read_cache_test.py:37: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam\runners\interactive\interactive_environment.py:119: in new_env > _interactive_beam_env.cleanup() > apache_beam\runners\interactive\interactive_environment.py:269: in cleanup > self.evict_recording_manager(pipeline) > apache_beam\runners\interactive\interactive_environment.py:396: in > evict_recording_manager > rm.clear() > apache_beam\runners\interactive\recording_manager.py:339: in clear > cache_manager.cleanup() > apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup > shutil.rmtree(self._cache_dir) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree > return _rmtree_unsafe(path, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in > _rmtree_unsafe > _rmtree_unsafe(fullname, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in > _rmtree_unsafe > onerror(os.unlink, fullname, sys.exc_info()) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full' > onerror = .onerror at 0x0244BA770A60> > def _rmtree_unsafe(path, onerror): > try: > if os.path.islink(path): > # symlinks to directories are forbidden, see bug #1669 > raise OSError("Cannot call rmtree on a symbolic link") > except OSError: > onerror(os.path.islink, path, sys.exc_info()) > # can't continue even if onerror hook returns > return > names = [] > try: > names = os.listdir(path) > except OSError: > onerror(os.listdir, path, sys.exc_info()) > for name in names: > fullname = os.path.join(path, name) > try: > mode = os.lstat(fullname).st_mode > except OSError: > mode = 0 > if stat.S_ISDIR(mode): > _rmtree_unsafe(fullname, onerror) > else: > try: > > os.unlink(fullname) > E PermissionError: [WinError 32] The process cannot access > the file because it is being used by another process: > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows
[ https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322499#comment-17322499 ] Ning Kang commented on BEAM-12178: -- I think it's something with Python file IO on windows. Some similar issues: https://stackoverflow.com/questions/33322932/windowserror-error-32-the-process-cannot-access-the-file-because-it-is-being. I'll suggest temporarily disable the test from running on windows. Let me make a change to do so. > ReadCacheTest flakes on Windows > --- > > Key: BEAM-12178 > URL: https://issues.apache.org/jira/browse/BEAM-12178 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Brian Hulette >Priority: P1 > Labels: currently-failing, flake > > Example failure: > https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757 > {code} > ReadCacheTest.test_read_cache > > [gw4] win32 -- Python 3.6.8 > d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe > self = testMethod=test_read_cache> > def setUp(self): > > ie.new_env() > apache_beam\runners\interactive\caching\read_cache_test.py:37: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam\runners\interactive\interactive_environment.py:119: in new_env > _interactive_beam_env.cleanup() > apache_beam\runners\interactive\interactive_environment.py:269: in cleanup > self.evict_recording_manager(pipeline) > apache_beam\runners\interactive\interactive_environment.py:396: in > evict_recording_manager > rm.clear() > apache_beam\runners\interactive\recording_manager.py:339: in clear > cache_manager.cleanup() > apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup > shutil.rmtree(self._cache_dir) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree > return _rmtree_unsafe(path, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in > _rmtree_unsafe > _rmtree_unsafe(fullname, onerror) > c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in > _rmtree_unsafe > onerror(os.unlink, fullname, sys.exc_info()) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > path = > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full' > onerror = .onerror at 0x0244BA770A60> > def _rmtree_unsafe(path, onerror): > try: > if os.path.islink(path): > # symlinks to directories are forbidden, see bug #1669 > raise OSError("Cannot call rmtree on a symbolic link") > except OSError: > onerror(os.path.islink, path, sys.exc_info()) > # can't continue even if onerror hook returns > return > names = [] > try: > names = os.listdir(path) > except OSError: > onerror(os.listdir, path, sys.exc_info()) > for name in names: > fullname = os.path.join(path, name) > try: > mode = os.lstat(fullname).st_mode > except OSError: > mode = 0 > if stat.S_ISDIR(mode): > _rmtree_unsafe(fullname, onerror) > else: > try: > > os.unlink(fullname) > E PermissionError: [WinError 32] The process cannot access > the file because it is being used by another process: > 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-12165) ParquetIO sink should allow to pass an Avro data model
[ https://issues.apache.org/jira/browse/BEAM-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321209#comment-17321209 ] Ning Kang commented on BEAM-12165: -- I closed Pull Request #14533 since I don't have much knowledge about the package, nor parquet or avro. I'll leave the issue to better hands to send a proper fix. > ParquetIO sink should allow to pass an Avro data model > -- > > Key: BEAM-12165 > URL: https://issues.apache.org/jira/browse/BEAM-12165 > Project: Beam > Issue Type: Improvement > Components: io-java-parquet >Reporter: Ning Kang >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > AvroParquetWriter instantiated in ParquetIO [1] does not specify the data > model. > The default is SpecificData model [2], while the AvroParquetReader is reading > with a GenericData model [3]. > ParquetIO should pass in the correct data model. > [1] > https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052 > [2] > https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163 > [3] > https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-12165) ParquetIO sink should allow to pass an Avro data model
[ https://issues.apache.org/jira/browse/BEAM-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-12165: Assignee: (was: Ning Kang) > ParquetIO sink should allow to pass an Avro data model > -- > > Key: BEAM-12165 > URL: https://issues.apache.org/jira/browse/BEAM-12165 > Project: Beam > Issue Type: Improvement > Components: io-java-parquet >Reporter: Ning Kang >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > AvroParquetWriter instantiated in ParquetIO [1] does not specify the data > model. > The default is SpecificData model [2], while the AvroParquetReader is reading > with a GenericData model [3]. > ParquetIO should pass in the correct data model. > [1] > https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052 > [2] > https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163 > [3] > https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-12165) ParquetIO does not pass in correct Avro data model
Ning Kang created BEAM-12165: Summary: ParquetIO does not pass in correct Avro data model Key: BEAM-12165 URL: https://issues.apache.org/jira/browse/BEAM-12165 Project: Beam Issue Type: Improvement Components: io-java-parquet Reporter: Ning Kang Assignee: Ning Kang AvroParquetWriter instantiated in ParquetIO [1] does not specify the data model. The default is SpecificData model [2], while the AvroParquetReader is reading with a GenericData model [3]. ParquetIO should pass in the correct data model. [1] https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052 [2] https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163 [3] https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-12096) Flake: test_progress_in_HTML_JS_when_in_notebook
[ https://issues.apache.org/jira/browse/BEAM-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-12096: - Description: * https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/ self = def test_progress_in_HTML_JS_when_in_notebook(self): ie.current_env()._is_in_notebook = True pi_path = 'apache_beam.runners.interactive.utils.ProgressIndicator' with patch('IPython.core.display.HTML') as mocked_html, \ patch('IPython.core.display.Javascript') as mocked_javascript, \ patch(pi_path + '.spinner_template') as enter_template, \ patch(pi_path + '.spinner_removal_template') as exit_template: enter_template.format.return_value = 'enter' exit_template.format.return_value = 'exit' @utils.progress_indicated def progress_indicated_dummy(): mocked_html.assert_any_call('enter') > progress_indicated_dummy() apache_beam/runners/interactive/utils_test.py:217: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ apache_beam/runners/interactive/utils.py:235: in run_within_progress_indicator return func(*args, **kwargs) apache_beam/runners/interactive/utils_test.py:215: in progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , args = ('enter',) kwargs = {}, expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...] >You have limited Interactive Beam features since your ipython kernel is not >connected any notebook frontend.')] cause = None, expected_string = >"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the mock >has been called with the specified arguments. The assert passes if the mock >has *ever* been called, unlike `assert_called_with` and >`assert_called_once_with` that only pass if the call is the most recent >one.""" expected = self._call_matcher((args, kwargs)) actual = >[self._call_matcher(c) for c in self.call_args_list] if expected not in >actual: cause = expected if isinstance(expected, Exception) else None >expected_string = self._format_mock_call_signature(args, kwargs) raise >AssertionError( '%s call not found' % expected_string > ) from cause E >AssertionError: HTML('enter') call not found >/usr/lib/python3.7/unittest/mock.py:891: AssertionError was: https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/ self = def test_progress_in_HTML_JS_when_in_notebook(self): ie.current_env()._is_in_notebook = True pi_path = 'apache_beam.runners.interactive.utils.ProgressIndicator' with patch('IPython.core.display.HTML') as mocked_html, \ patch('IPython.core.display.Javascript') as mocked_javascript, \ patch(pi_path + '.spinner_template') as enter_template, \ patch(pi_path + '.spinner_removal_template') as exit_template: enter_template.format.return_value = 'enter' exit_template.format.return_value = 'exit' @utils.progress_indicated def progress_indicated_dummy(): mocked_html.assert_any_call('enter') > progress_indicated_dummy() apache_beam/runners/interactive/utils_test.py:217: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ apache_beam/runners/interactive/utils.py:235: in run_within_progress_indicator return func(*args, **kwargs) apache_beam/runners/interactive/utils_test.py:215: in progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , args = ('enter',) kwargs = {}, expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...] >You have limited Interactive Beam features since your ipython kernel is not >connected any notebook frontend.')] cause = None, expected_string = >"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the mock >has been called with the specified arguments. The assert passes if the mock >has *ever* been called, unlike `assert_called_with` and >`assert_called_once_with` that only pass if the call is the most recent >one.""" expected = self._call_matcher((args, kwargs)) actual = >[self._call_matcher(c) for c in self.call_args_list] if expected not in >actual: cause = expected if isinstance(expected, Exception) else None >expected_string = self._format_mock_call_signature(args, kwargs) raise >AssertionError( '%s call not found' % expected_string > ) from cause E >AssertionError: HTML('enter') call not found >/usr/lib/python3.7/unittest
[jira] [Work started] (BEAM-12096) Flake: test_progress_in_HTML_JS_when_in_notebook
[ https://issues.apache.org/jira/browse/BEAM-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-12096 started by Ning Kang. > Flake: test_progress_in_HTML_JS_when_in_notebook > > > Key: BEAM-12096 > URL: https://issues.apache.org/jira/browse/BEAM-12096 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Ning Kang >Priority: P1 > Labels: flaky-test > > https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/ > self = testMethod=test_progress_in_HTML_JS_when_in_notebook> def > test_progress_in_HTML_JS_when_in_notebook(self): > ie.current_env()._is_in_notebook = True pi_path = > 'apache_beam.runners.interactive.utils.ProgressIndicator' with > patch('IPython.core.display.HTML') as mocked_html, \ > patch('IPython.core.display.Javascript') as mocked_javascript, \ > patch(pi_path + '.spinner_template') as enter_template, \ patch(pi_path + > '.spinner_removal_template') as exit_template: > enter_template.format.return_value = 'enter' > exit_template.format.return_value = 'exit' @utils.progress_indicated def > progress_indicated_dummy(): mocked_html.assert_any_call('enter') > > progress_indicated_dummy() apache_beam/runners/interactive/utils_test.py:217: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ apache_beam/runners/interactive/utils.py:235: in > run_within_progress_indicator return func(*args, **kwargs) > apache_beam/runners/interactive/utils_test.py:215: in > progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = > , args = ('enter',) kwargs = {}, > expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...] > >You have limited Interactive Beam features since your ipython kernel is not > >connected any notebook frontend.')] cause = None, expected_string = > >"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the > >mock has been called with the specified arguments. The assert passes if the > >mock has *ever* been called, unlike `assert_called_with` and > >`assert_called_once_with` that only pass if the call is the most recent > >one.""" expected = self._call_matcher((args, kwargs)) actual = > >[self._call_matcher(c) for c in self.call_args_list] if expected not in > >actual: cause = expected if isinstance(expected, Exception) else None > >expected_string = self._format_mock_call_signature(args, kwargs) raise > >AssertionError( '%s call not found' % expected_string > ) from cause E > >AssertionError: HTML('enter') call not found > >/usr/lib/python3.7/unittest/mock.py:891: AssertionError -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10514) Make sure Interactive Beam cache file path length does not exceed OS limits
[ https://issues.apache.org/jira/browse/BEAM-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314971#comment-17314971 ] Ning Kang commented on BEAM-10514: -- This should have been fixed. > Make sure Interactive Beam cache file path length does not exceed OS limits > --- > > Key: BEAM-10514 > URL: https://issues.apache.org/jira/browse/BEAM-10514 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Priority: P3 > Time Spent: 1h > Remaining Estimate: 0h > > The path length limit in Linux is 4096. > The limit in windows API is 260. > If long file name support is enabled in windows, the limit will be 32767. > An example path that doesn't work on windows is > c:\windows\temp\interactive-temp-xqj_fv471021776\cache-20-07-16-08_28_57\full\beam-temp-anonymous_pcollection_433934912-433934912-433938272-471021776-65e3a91ec73e11ea91b7e69191fc0bf0\284e0d91-ca8a-463c-a894-bde70cdec599.anonymous_pcollection_433934912-433934912-433938272-471021776 > (length: 281). > Consider using obfuscation when mapping in-memory PCollections into files. > If long file name is needed, consider storing a reverse indexed mapping in > memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10514) Make sure Interactive Beam cache file path length does not exceed OS limits
[ https://issues.apache.org/jira/browse/BEAM-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10514: - Resolution: Fixed Status: Resolved (was: Open) > Make sure Interactive Beam cache file path length does not exceed OS limits > --- > > Key: BEAM-10514 > URL: https://issues.apache.org/jira/browse/BEAM-10514 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Priority: P3 > Time Spent: 1h > Remaining Estimate: 0h > > The path length limit in Linux is 4096. > The limit in windows API is 260. > If long file name support is enabled in windows, the limit will be 32767. > An example path that doesn't work on windows is > c:\windows\temp\interactive-temp-xqj_fv471021776\cache-20-07-16-08_28_57\full\beam-temp-anonymous_pcollection_433934912-433934912-433938272-471021776-65e3a91ec73e11ea91b7e69191fc0bf0\284e0d91-ca8a-463c-a894-bde70cdec599.anonymous_pcollection_433934912-433934912-433938272-471021776 > (length: 281). > Consider using obfuscation when mapping in-memory PCollections into files. > If long file name is needed, consider storing a reverse indexed mapping in > memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9630) Python SDK streaming from Pubsub getting error `grpc.StatusRuntimeException: CANCELLED: call already cancelled`
[ https://issues.apache.org/jira/browse/BEAM-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314095#comment-17314095 ] Ning Kang commented on BEAM-9630: - Is this something a user can ignore? A similar question is asked [here](https://stackoverflow.com/questions/66893912/apache-beam-statusruntimeexception-on-dataflow-pipeline). > Python SDK streaming from Pubsub getting error `grpc.StatusRuntimeException: > CANCELLED: call already cancelled` > --- > > Key: BEAM-9630 > URL: https://issues.apache.org/jira/browse/BEAM-9630 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.19.0 > Environment: python==3.7.5 > apache-beam[gcp]==2.19.0 > google-cloud-pubsub==1.4.2 >Reporter: Kan Dong >Priority: P2 > Fix For: Not applicable > > > I have a dataflow streaming job using Apache Beam Python 3.7 SDK 2.19.0. The > job consumes pubsub messages, treat data and publish to pubsub as output. > Periodically, I would get the below error messages and the worker would stop > consuming messages. > ```Error message from worker: java.util.concurrent.ExecutionException: > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: cancelled before receiving half close > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) > org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:332) > > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > > org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) > > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1350) > > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1100(StreamingDataflowWorker.java:152) > > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$7.run(StreamingDataflowWorker.java:1073) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) Caused by: > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: cancelled before receiving half close > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:273) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:337) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:793) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) > > org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.finish(RemoteGrpcPortWriteOperation.java:21
[jira] [Commented] (BEAM-11797) Flaky interactive test in Precommit
[ https://issues.apache.org/jira/browse/BEAM-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311758#comment-17311758 ] Ning Kang commented on BEAM-11797: -- Find the root cause: assert_called_with does not mean `asserting has been called with` but `asserting the last call is`. And there is an IPythonLogHandler that emits logs from time to time that is not so deterministic when running on a Jenkins environment. The handler uses the same display module that is used by the decorator in test, causing the flakiness. Changed `assert_called_with` with `assert_any_call` to fix it. > Flaky interactive test in Precommit > --- > > Key: BEAM-11797 > URL: https://issues.apache.org/jira/browse/BEAM-11797 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Ning Kang >Priority: P2 > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-11797) Flaky interactive test in Precommit
[ https://issues.apache.org/jira/browse/BEAM-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300537#comment-17300537 ] Ning Kang commented on BEAM-11797: -- Working on it now. > Flaky interactive test in Precommit > --- > > Key: BEAM-11797 > URL: https://issues.apache.org/jira/browse/BEAM-11797 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Ning Kang >Priority: P2 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8288) Cleanup Interactive Beam Python 2 support
[ https://issues.apache.org/jira/browse/BEAM-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-8288: Resolution: Fixed Status: Resolved (was: Open) > Cleanup Interactive Beam Python 2 support > - > > Key: BEAM-8288 > URL: https://issues.apache.org/jira/browse/BEAM-8288 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Priority: P3 > Time Spent: 1.5h > Remaining Estimate: 0h > > As Beam is retiring Python 2, some special handle in Interactive Beam code > and tests will need to be cleaned up. > This Jira ticket tracks those changes to be cleaned up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8288) Cleanup Interactive Beam Python 2 support
[ https://issues.apache.org/jira/browse/BEAM-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300536#comment-17300536 ] Ning Kang commented on BEAM-8288: - [~yoshiki.obata] Thanks, closing it. > Cleanup Interactive Beam Python 2 support > - > > Key: BEAM-8288 > URL: https://issues.apache.org/jira/browse/BEAM-8288 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Priority: P3 > Time Spent: 1.5h > Remaining Estimate: 0h > > As Beam is retiring Python 2, some special handle in Interactive Beam code > and tests will need to be cleaned up. > This Jira ticket tracks those changes to be cleaned up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11906) No trigger early repeatedly for session windows
Ning Kang created BEAM-11906: Summary: No trigger early repeatedly for session windows Key: BEAM-11906 URL: https://issues.apache.org/jira/browse/BEAM-11906 Project: Beam Issue Type: Improvement Components: runner-dataflow Affects Versions: 2.28.0, 2.23.0 Reporter: Ning Kang Originated from: https://stackoverflow.com/questions/66381608/apache-beam-does-not-trigger-early-repeatedly-for-session-windows-on-google-data The following pipeline fires early after each element when running locally using DirectRunner, but there are no early triggers when running on google cloud dataflow. On dataflow it triggers only after the session window has closed. {code:python} ( p | 'read' >> beam.io.ReadFromPubSub(subscription = 'projects/xxx/subscriptions/xxx-sub') | 'json' >> beam.Map(lambda x: json.loads(x.decode('utf-8'))) | 'kv' >> beam.Map(lambda x: (x['id'], x['amount'])) | 'window' >> beam.WindowInto(window.Sessions(15*60), trigger=trigger.Repeatedly(trigger.AfterCount(1)), accumulation_mode=AccumulationMode.ACCUMULATING) | 'group' >> beam.GroupByKey() | 'log'>> beam.Map(lambda x: logging.info(x)) ) {code} Apache Beam versions tried: 2.23 and 2.28. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-11666) apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution is flaky
[ https://issues.apache.org/jira/browse/BEAM-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-11666: Assignee: Sam Rohde (was: Ning Kang) > apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution > is flaky > - > > Key: BEAM-11666 > URL: https://issues.apache.org/jira/browse/BEAM-11666 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Sam Rohde >Priority: P1 > Labels: stale-assigned > > Happened in: https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/16819 > {noformat} > self = > testMethod=test_basic_execution> > @unittest.skipIf( > sys.version_info < (3, 6, 0), > 'This test requires at least Python 3.6 to work.') > def test_basic_execution(self): > """A basic pipeline to be used as a smoke test.""" > > # Create the pipeline that will emit 0, 1, 2. > p = beam.Pipeline(InteractiveRunner()) > numbers = p | 'numbers' >> beam.Create([0, 1, 2]) > letters = p | 'letters' >> beam.Create(['a', 'b', 'c']) > > # Watch the pipeline and PCollections. This is normally done in a > notebook > # environment automatically, but we have to do it manually here. > ib.watch(locals()) > ie.current_env().track_user_pipelines() > > # Create the recording objects. By calling `record` a new > PipelineFragment > # is started to compute the given PCollections and cache to disk. > rm = RecordingManager(p) > > numbers_recording = rm.record([numbers], max_n=3, max_duration=500) > apache_beam/runners/interactive/recording_manager_test.py:331: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam/runners/interactive/recording_manager.py:435: in record > self._clear(pipeline_instrument) > apache_beam/runners/interactive/recording_manager.py:319: in _clear > self._clear_pcolls(cache_manager, set(to_clear)) > apache_beam/runners/interactive/recording_manager.py:323: in _clear_pcolls > cache_manager.clear('full', pc) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > object at 0x7fa3903ac208> > labels = ('full', > 'ee5c35ce3d-140340882711664-140340882712560-140340476166608') > def clear(self, *labels): > # type (*str) -> Boolean > > """Clears the cache entry of the given labels and returns True on > success. > > Args: > value: An encodable (with corresponding PCoder) value > *labels: List of labels for PCollection instance > """ > > raise NotImplementedError > E NotImplementedError > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11588) beam_PreCommit_PythonDocs_Cron failing
[ https://issues.apache.org/jira/browse/BEAM-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11588: - Resolution: Fixed Status: Resolved (was: Open) Mark the issue as fixed. > beam_PreCommit_PythonDocs_Cron failing > -- > > Key: BEAM-11588 > URL: https://issues.apache.org/jira/browse/BEAM-11588 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Ning Kang >Priority: P0 > Time Spent: 2h 40m > Remaining Estimate: 0h > > Error: 04:13:26 jupyter-client 6.1.10 has requirement jedi<=0.17.2, but you > have jedi 0.18.0. > Example Log: > https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/584/console > It seems like this happened due to a new version of jupyter-client released > and it is not compatible with jedi 0.18.0 (more here: > https://github.com/jupyter/jupyter_client/issues/597) > Attempting to put an upper limit on jupyter-client did not work > (https://github.com/apache/beam/pull/13709) Because ipykernel installs latest > version of the jupyter-client first. Current dependency tree looks like > (omitting unrelated parts): > ipykernel==5.4.2 > - ipython [required: >=5.0.0, installed: 7.19.0] > - jedi [required: >=0.10, installed: 0.17.0] > - jupyter-client [required: Any, installed: 6.1.10] > - jedi [required: <=0.17.2, installed: 0.17.0] > Potential solutions: > - Force install jedi <= 0.17.2 > - Force install jupyter-client <= 6.1.7 > Alos, we can probably remove jupyter-client as an explicit dependency, since > ipykernel already depends on it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-11476) flaky test: test_track_user_pipeline_cleanup_non_inspectable_pipeline
[ https://issues.apache.org/jira/browse/BEAM-11476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277528#comment-17277528 ] Ning Kang commented on BEAM-11476: -- Thanks Brian, the flaky (failed) test is here: https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/17094/testReport/junit/apache_beam.runners.interactive.interactive_environment_test/InteractiveEnvironmentTest/test_track_user_pipeline_cleanup_non_inspectable_pipeline_2/ The `Standard Errors` are warning logs. I'll see if I can disable the log handler in tests. > flaky test: test_track_user_pipeline_cleanup_non_inspectable_pipeline > - > > Key: BEAM-11476 > URL: https://issues.apache.org/jira/browse/BEAM-11476 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing, flake > > Test name: > apache_beam.runners.interactive.interactive_environment_test.InteractiveEnvironmentTest.test_track_user_pipeline_cleanup_non_inspectable_pipeline > {code} > Error Message > AssertionError: Expected 'cleanup' to not have been called. Called 1 times. > Stacktrace > self = > testMethod=test_track_user_pipeline_cleanup_non_inspectable_pipeline> > mocked_cleanup = > @patch( > 'apache_beam.runners.interactive.interactive_environment' > '.InteractiveEnvironment.cleanup') > def test_track_user_pipeline_cleanup_non_inspectable_pipeline( > self, mocked_cleanup): > ie._interactive_beam_env = None > ie.new_env() > dummy_pipeline_1 = beam.Pipeline() > dummy_pipeline_2 = beam.Pipeline() > dummy_pipeline_3 = beam.Pipeline() > dummy_pipeline_4 = beam.Pipeline() > dummy_pcoll = dummy_pipeline_4 | beam.Create([1]) > dummy_pipeline_5 = beam.Pipeline() > dummy_non_inspectable_pipeline = 'dummy' > ie.current_env().watch(locals()) > from apache_beam.runners.interactive.background_caching_job import > BackgroundCachingJob > ie.current_env().set_background_caching_job( > dummy_pipeline_1, > BackgroundCachingJob( > runner.PipelineResult(runner.PipelineState.DONE), limiters=[])) > ie.current_env().set_test_stream_service_controller(dummy_pipeline_2, > None) > ie.current_env().set_cache_manager( > cache.FileBasedCacheManager(), dummy_pipeline_3) > ie.current_env().mark_pcollection_computed([dummy_pcoll]) > ie.current_env().set_cached_source_signature( > dummy_non_inspectable_pipeline, None) > ie.current_env().set_pipeline_result( > dummy_pipeline_5, > runner.PipelineResult(runner.PipelineState.RUNNING)) > > mocked_cleanup.assert_not_called() > apache_beam/runners/interactive/interactive_environment_test.py:265: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > _mock_self = > def assert_not_called(_mock_self): > """assert that the mock was never called. > """ > self = _mock_self > if self.call_count != 0: > msg = ("Expected '%s' to not have been called. Called %s times." % >(self._mock_name or 'mock', self.call_count)) > > raise AssertionError(msg) > E AssertionError: Expected 'cleanup' to not have been called. > Called 1 times. > /usr/lib/python3.7/unittest/mock.py:792: AssertionError > {code} > https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/3609/ > https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/3607/ > https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/3593/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue
[ https://issues.apache.org/jira/browse/BEAM-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11705: - Description: The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is True: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any effect when `ignore_insert_id` is True in the implementation because it skipped the `ReshufflePerKey` (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1422). When `ignore_insert_id` is True, we seem to lost the batch size control? was: The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is True: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any effect when `ignore_insert_id` is True in the implementation because it skipped the `ReshufflePerKey`. When `ignore_insert_id` is True, we seem to lost the batch size control? > Write to bigquery always assigns unique insert id per row causing performance > issue > --- > > Key: BEAM-11705 > URL: https://issues.apache.org/jira/browse/BEAM-11705 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Ning Kang >Assignee: Pablo Estrada >Priority: P2 > > The `ignore_insert_id` argument in BigQuery IO Connector > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 > does not take effect. > Because the implementation of sending insert rows request always uses an auto > generated uuid even when the insert_ids is set to None when > `ignore_insert_id` is True: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 > The implementation should explicitly set insert_id as None instead of using a > generated uuid, an example: > https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 > An unique insert id per row would make the streaming inserts very slow. > Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any > effect when `ignore_insert_id` is True in the implementation because it > skipped the `ReshufflePerKey` > (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1422). > When `ignore_insert_id` is True, we seem to lost the batch size control? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue
[ https://issues.apache.org/jira/browse/BEAM-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11705: - Description: The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is True: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any effect when `ignore_insert_id` is True in the implementation because it skipped the `ReshufflePerKey`. When `ignore_insert_id` is True, we seem to lost the batch size control? was: The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is True: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. > Write to bigquery always assigns unique insert id per row causing performance > issue > --- > > Key: BEAM-11705 > URL: https://issues.apache.org/jira/browse/BEAM-11705 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Ning Kang >Assignee: Pablo Estrada >Priority: P2 > > The `ignore_insert_id` argument in BigQuery IO Connector > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 > does not take effect. > Because the implementation of sending insert rows request always uses an auto > generated uuid even when the insert_ids is set to None when > `ignore_insert_id` is True: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 > The implementation should explicitly set insert_id as None instead of using a > generated uuid, an example: > https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 > An unique insert id per row would make the streaming inserts very slow. > Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any > effect when `ignore_insert_id` is True in the implementation because it > skipped the `ReshufflePerKey`. When `ignore_insert_id` is True, we seem to > lost the batch size control? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue
[ https://issues.apache.org/jira/browse/BEAM-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11705: - Description: The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is True: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. was: The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is False: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. > Write to bigquery always assigns unique insert id per row causing performance > issue > --- > > Key: BEAM-11705 > URL: https://issues.apache.org/jira/browse/BEAM-11705 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Ning Kang >Assignee: Pablo Estrada >Priority: P2 > > The `ignore_insert_id` argument in BigQuery IO Connector > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 > does not take effect. > Because the implementation of sending insert rows request always uses an auto > generated uuid even when the insert_ids is set to None when > `ignore_insert_id` is True: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 > The implementation should explicitly set insert_id as None instead of using a > generated uuid, an example: > https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 > An unique insert id per row would make the streaming inserts very slow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue
Ning Kang created BEAM-11705: Summary: Write to bigquery always assigns unique insert id per row causing performance issue Key: BEAM-11705 URL: https://issues.apache.org/jira/browse/BEAM-11705 Project: Beam Issue Type: Improvement Components: io-py-gcp Reporter: Ning Kang The `ignore_insert_id` argument in BigQuery IO Connector https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471 does not take effect. Because the implementation of sending insert rows request always uses an auto generated uuid even when the insert_ids is set to None when `ignore_insert_id` is False: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062 The implementation should explicitly set insert_id as None instead of using a generated uuid, an example: https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33 An unique insert id per row would make the streaming inserts very slow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11692) Support DataflowRunner as an underlying runner of InteractiveRunner
Ning Kang created BEAM-11692: Summary: Support DataflowRunner as an underlying runner of InteractiveRunner Key: BEAM-11692 URL: https://issues.apache.org/jira/browse/BEAM-11692 Project: Beam Issue Type: Improvement Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang When using DataflowRunner as an underlying runner and GCS buckets as cache dirs, the `ib.show()` and `ib.collect()` misbehaves because the cache manager does not wait for cache file to be written while DataflowRunner running on Dataflow service takes a long time to complete the write tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268918#comment-17268918 ] Ning Kang commented on BEAM-10708: -- Thanks for the comments, Cham and Brian! If `from_runner_api` is not a possible solution to copy pipelines, the InteractiveRunner has to either copy the pipeline objects explicitly or make changes directly on the proto before each iteration of execution. > InteractiveRunner cannot execute pipeline with cross-language transform > --- > > Key: BEAM-10708 > URL: https://issues.apache.org/jira/browse/BEAM-10708 > Project: Beam > Issue Type: Bug > Components: cross-language >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P2 > > The InteractiveRunner crashes when given a pipeline that includes a > cross-language transform. > Here's the example I tried to run in a jupyter notebook: > {code:python} > p = beam.Pipeline(InteractiveRunner()) > pc = (p | SqlTransform("""SELECT > CAST(1 AS INT) AS `id`, > CAST('foo' AS VARCHAR) AS `str`, > CAST(3.14 AS DOUBLE) AS `flt`""")) > df = interactive_beam.collect(pc) > {code} > The problem occurs when > [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66] > creates a copy of the pipeline by [writing it to proto and reading it > back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120]. > Reading it back fails because some of the pipeline is not written in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263715#comment-17263715 ] Ning Kang edited comment on BEAM-10708 at 1/13/21, 2:30 AM: As of today (2021-01-12), when a pipeline including a SqlTransform is executed with InteractiveRunner() or invoked in `from_runner_api(to_runner_api())`, an example failure stack trace can be found as following: {code:python} --- ValueErrorTraceback (most recent call last) in > 1 ib.show(pcoll) ~/beam/sdks/python/apache_beam/runners/interactive/utils.py in run_within_progress_indicator(*args, **kwargs) 226 def run_within_progress_indicator(*args, **kwargs): 227 with ProgressIndicator('Processing...', 'Done.'): --> 228 return func(*args, **kwargs) 229 230 return run_within_progress_indicator ~/beam/sdks/python/apache_beam/runners/interactive/interactive_beam.py in show(*pcolls, **configs) 484 recording_manager = ie.current_env().get_recording_manager( 485 user_pipeline, create_if_absent=True) --> 486 recording = recording_manager.record(pcolls, max_n=n, max_duration=duration) 487 488 # Catch a KeyboardInterrupt to gracefully cancel the recording and ~/beam/sdks/python/apache_beam/runners/interactive/recording_manager.py in record(self, pcolls, max_n, max_duration) 420 # arbitrary variables. 421 self._watch(pcolls) --> 422 pipeline_instrument = pi.PipelineInstrument(self.user_pipeline) 423 self.record_pipeline() 424 ~/beam/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py in __init__(self, pipeline, options) 113 # proto is stable. The snapshot of pipeline will not be mutated within this 114 # module and can be used to recover original pipeline if needed. --> 115 self._pipeline_snap = beam.pipeline.Pipeline.from_runner_api( 116 pipeline.to_runner_api(use_fake_coders=True), pipeline.runner, options) 117 ie.current_env().add_derived_pipeline(self._pipeline, self._pipeline_snap) ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, runner, options, return_context) 900 if proto.root_transform_ids: 901 root_transform_id, = proto.root_transform_ids --> 902 p.transforms_stack = [context.transforms.get_by_id(root_transform_id)] 903 else: 904 p.transforms_stack = [AppliedPTransform(None, None, '', None)] ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -> 1252 part = context.transforms.get_by_id(transform_id) 1253 part.parent = result 1254 result.parts.append(part) ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -> 1252 part = context.transforms.get_by_id(transform_id) 1253 part.parent = result 1254 result.parts.append(part) ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -> 1252 part = context.transforms.get_by_id(transform_id) 1253 part.parent = result 1254 result.parts.append(part) ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(
[jira] [Commented] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263715#comment-17263715 ] Ning Kang commented on BEAM-10708: -- As of today (2021-01-12), when a pipeline including a SqlTransform is executed with InteractiveRunner(), an example failure stack trace can be found as following: {code:python} --- ValueErrorTraceback (most recent call last) in > 1 ib.show(pcoll) ~/beam/sdks/python/apache_beam/runners/interactive/utils.py in run_within_progress_indicator(*args, **kwargs) 226 def run_within_progress_indicator(*args, **kwargs): 227 with ProgressIndicator('Processing...', 'Done.'): --> 228 return func(*args, **kwargs) 229 230 return run_within_progress_indicator ~/beam/sdks/python/apache_beam/runners/interactive/interactive_beam.py in show(*pcolls, **configs) 484 recording_manager = ie.current_env().get_recording_manager( 485 user_pipeline, create_if_absent=True) --> 486 recording = recording_manager.record(pcolls, max_n=n, max_duration=duration) 487 488 # Catch a KeyboardInterrupt to gracefully cancel the recording and ~/beam/sdks/python/apache_beam/runners/interactive/recording_manager.py in record(self, pcolls, max_n, max_duration) 420 # arbitrary variables. 421 self._watch(pcolls) --> 422 pipeline_instrument = pi.PipelineInstrument(self.user_pipeline) 423 self.record_pipeline() 424 ~/beam/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py in __init__(self, pipeline, options) 113 # proto is stable. The snapshot of pipeline will not be mutated within this 114 # module and can be used to recover original pipeline if needed. --> 115 self._pipeline_snap = beam.pipeline.Pipeline.from_runner_api( 116 pipeline.to_runner_api(use_fake_coders=True), pipeline.runner, options) 117 ie.current_env().add_derived_pipeline(self._pipeline, self._pipeline_snap) ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, runner, options, return_context) 900 if proto.root_transform_ids: 901 root_transform_id, = proto.root_transform_ids --> 902 p.transforms_stack = [context.transforms.get_by_id(root_transform_id)] 903 else: 904 p.transforms_stack = [AppliedPTransform(None, None, '', None)] ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -> 1252 part = context.transforms.get_by_id(transform_id) 1253 part.parent = result 1254 result.parts.append(part) ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -> 1252 part = context.transforms.get_by_id(transform_id) 1253 part.parent = result 1254 result.parts.append(part) ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -> 1252 part = context.transforms.get_by_id(transform_id) 1253 part.parent = result 1254 result.parts.append(part) ~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, id) 113 # type: (str) -> PortableObjectT 114 if id not in self._id_to_obj: --> 115 self._id_to_obj[id] = self._obj_type.from_runner_api( 116 self._id_to_proto[id], self._pipeline_context) 117 return self._id_to_obj[id] ~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context) 1250 result.parts = [] 1251 for transform_id in proto.subtransforms: -
[jira] [Updated] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10708: - Priority: P2 (was: P3) > InteractiveRunner cannot execute pipeline with cross-language transform > --- > > Key: BEAM-10708 > URL: https://issues.apache.org/jira/browse/BEAM-10708 > Project: Beam > Issue Type: Bug > Components: cross-language >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P2 > > The InteractiveRunner crashes when given a pipeline that includes a > cross-language transform. > Here's the example I tried to run in a jupyter notebook: > {code:python} > p = beam.Pipeline(InteractiveRunner()) > pc = (p | SqlTransform("""SELECT > CAST(1 AS INT) AS `id`, > CAST('foo' AS VARCHAR) AS `str`, > CAST(3.14 AS DOUBLE) AS `flt`""")) > df = interactive_beam.collect(pc) > {code} > The problem occurs when > [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66] > creates a copy of the pipeline by [writing it to proto and reading it > back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120]. > Reading it back fails because some of the pipeline is not written in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform
[ https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-10708: Assignee: Ning Kang > InteractiveRunner cannot execute pipeline with cross-language transform > --- > > Key: BEAM-10708 > URL: https://issues.apache.org/jira/browse/BEAM-10708 > Project: Beam > Issue Type: Bug > Components: cross-language >Reporter: Brian Hulette >Assignee: Ning Kang >Priority: P3 > > The InteractiveRunner crashes when given a pipeline that includes a > cross-language transform. > Here's the example I tried to run in a jupyter notebook: > {code:python} > p = beam.Pipeline(InteractiveRunner()) > pc = (p | SqlTransform("""SELECT > CAST(1 AS INT) AS `id`, > CAST('foo' AS VARCHAR) AS `str`, > CAST(3.14 AS DOUBLE) AS `flt`""")) > df = interactive_beam.collect(pc) > {code} > The problem occurs when > [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66] > creates a copy of the pipeline by [writing it to proto and reading it > back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120]. > Reading it back fails because some of the pipeline is not written in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'
[ https://issues.apache.org/jira/browse/BEAM-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-11362 started by Ning Kang. > retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) > AttributeError: 'NoneType' object has no attribute 'visit' > > > Key: BEAM-11362 > URL: https://issues.apache.org/jira/browse/BEAM-11362 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > Traceback (most recent call last): > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py", > line 614, in testProduceOutputVisualization > _ = pipeline.run() > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py", > line 553, in run > return self.runner.run_pipeline(self, self._options) > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py", > line 136, in run_pipeline > inst.watch_sources(pipeline) > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py", > line 1008, in watch_sources > retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) > AttributeError: 'NoneType' object has no attribute 'visit' > Probably related to this change: https://github.com/apache/beam/pull/13335 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'
[ https://issues.apache.org/jira/browse/BEAM-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241133#comment-17241133 ] Ning Kang commented on BEAM-11362: -- This is intended behavior with change of https://github.com/apache/beam/pull/13335. As an InteractiveRunner user, when defining a pipeline outside the main scope (such as within a function), they have to use `ib.watch({'pipeline_name': pipeline_instance})` or `ib.watch(locals())` to track the pipeline in the interactive environment. Otherwise, they will run into the issue. This does not impact Beam notebook users, because they either define pipelines in the main scope (directly in notebook cells) or pass around the pipeline/pcollection objects and use them in `ib.show()`, `ib.show_graph()` or `ib.collect()` (the APIs track the pcollections/pipelines automatically). This only affects users using InteractiveRunner for the purpose of materializing PCollections such as in tests where they don't really care about the pipeline object. The fix should be adding the explicit `watch` statement in the same scope after defining the pipeline. > retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) > AttributeError: 'NoneType' object has no attribute 'visit' > > > Key: BEAM-11362 > URL: https://issues.apache.org/jira/browse/BEAM-11362 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > Traceback (most recent call last): > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py", > line 614, in testProduceOutputVisualization > _ = pipeline.run() > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py", > line 553, in run > return self.runner.run_pipeline(self, self._options) > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py", > line 136, in run_pipeline > inst.watch_sources(pipeline) > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py", > line 1008, in watch_sources > retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) > AttributeError: 'NoneType' object has no attribute 'visit' > Probably related to this change: https://github.com/apache/beam/pull/13335 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'
[ https://issues.apache.org/jira/browse/BEAM-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11362: - Status: Resolved (was: In Progress) > retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) > AttributeError: 'NoneType' object has no attribute 'visit' > > > Key: BEAM-11362 > URL: https://issues.apache.org/jira/browse/BEAM-11362 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > Traceback (most recent call last): > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py", > line 614, in testProduceOutputVisualization > _ = pipeline.run() > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py", > line 553, in run > return self.runner.run_pipeline(self, self._options) > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py", > line 136, in run_pipeline > inst.watch_sources(pipeline) > File > "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py", > line 1008, in watch_sources > retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) > AttributeError: 'NoneType' object has no attribute 'visit' > Probably related to this change: https://github.com/apache/beam/pull/13335 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'
Ning Kang created BEAM-11362: Summary: retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit' Key: BEAM-11362 URL: https://issues.apache.org/jira/browse/BEAM-11362 Project: Beam Issue Type: Improvement Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang Traceback (most recent call last): File "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py", line 614, in testProduceOutputVisualization _ = pipeline.run() File "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py", line 553, in run return self.runner.run_pipeline(self, self._options) File "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py", line 136, in run_pipeline inst.watch_sources(pipeline) File "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py", line 1008, in watch_sources retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit' Probably related to this change: https://github.com/apache/beam/pull/13335 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11339) Cache clean up might fail on Windows
Ning Kang created BEAM-11339: Summary: Cache clean up might fail on Windows Key: BEAM-11339 URL: https://issues.apache.org/jira/browse/BEAM-11339 Project: Beam Issue Type: Improvement Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang Details can be found in this PR: https://github.com/apache/beam/pull/12779 {code:java} apache_beam\runners\interactive\recording_manager_test.py:75: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ apache_beam\runners\interactive\interactive_environment.py:118: in new_env _interactive_beam_env.cleanup() apache_beam\runners\interactive\interactive_environment.py:272: in cleanup cache_manager.cleanup() apache_beam\runners\interactive\caching\streaming_cache.py:391: in cleanup shutil.rmtree(self._cache_dir) c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree return _rmtree_unsafe(path, onerror) c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in _rmtree_unsafe onerror(os.unlink, fullname, sys.exc_info()) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ... > os.unlink(fullname) E PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-4m1c1oje2145793178144\\full\\fb91a47796-2145832985040-2145832986608-2145793178144' {code} This does not happen on Linux-like systems: https://user-images.githubusercontent.com/4423149/100034273-b343ef80-2db0-11eb-8988-dcfc2c79322c.png A potential solution is to enable ignore_errors and log about manually cleaning up onerror. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10970) apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is flaky on Windows
[ https://issues.apache.org/jira/browse/BEAM-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-10970: Assignee: Sam Rohde (was: Ning Kang) > apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is > flaky on Windows > > > Key: BEAM-10970 > URL: https://issues.apache.org/jira/browse/BEAM-10970 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robin Qiu >Assignee: Sam Rohde >Priority: P1 > Labels: currently-failing > > There are two test suites that are failing now: > > Python Unit Tests (macos-latest, 3.7, py37): > [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443154] > > Python Unit Tests (windows-latest, 3.7, py37): > [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443198] > > I am not very familiar with the Python tests suites in GitHub check, but I > think this failure should be a release blocker? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10970) apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is flaky on Windows
[ https://issues.apache.org/jira/browse/BEAM-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218681#comment-17218681 ] Ning Kang commented on BEAM-10970: -- Hi [~rohdesam] , I think you've mentioned you were working on this issue. Please feel free to mark this ticket as duplicated to the one you are working on. Thanks! > apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is > flaky on Windows > > > Key: BEAM-10970 > URL: https://issues.apache.org/jira/browse/BEAM-10970 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Robin Qiu >Assignee: Ning Kang >Priority: P1 > Labels: currently-failing > > There are two test suites that are failing now: > > Python Unit Tests (macos-latest, 3.7, py37): > [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443154] > > Python Unit Tests (windows-latest, 3.7, py37): > [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443198] > > I am not very familiar with the Python tests suites in GitHub check, but I > think this failure should be a release blocker? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11045) Screen diff integration tests fail
[ https://issues.apache.org/jira/browse/BEAM-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11045: - Status: Resolved (was: In Progress) > Screen diff integration tests fail > -- > > Key: BEAM-11045 > URL: https://issues.apache.org/jira/browse/BEAM-11045 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 2h > Remaining Estimate: 0h > > The tests still function with Python 3.7. > Golden screenshots and chromedriver-binary need to be updated because the > version advancement of newest chrome browser. > The tests start failing with Python 3.8 due to the incompatibility in > `Process` from `multiprocessing` module. > A process is used to start an HTTP server to serve notebook results in > headless browser for taking screenshots. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-11045) Screen diff integration tests fail
[ https://issues.apache.org/jira/browse/BEAM-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-11045 started by Ning Kang. > Screen diff integration tests fail > -- > > Key: BEAM-11045 > URL: https://issues.apache.org/jira/browse/BEAM-11045 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 2h > Remaining Estimate: 0h > > The tests still function with Python 3.7. > Golden screenshots and chromedriver-binary need to be updated because the > version advancement of newest chrome browser. > The tests start failing with Python 3.8 due to the incompatibility in > `Process` from `multiprocessing` module. > A process is used to start an HTTP server to serve notebook results in > headless browser for taking screenshots. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11056) Fix warning message and rename old APIs
[ https://issues.apache.org/jira/browse/BEAM-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11056: - Status: Resolved (was: In Progress) > Fix warning message and rename old APIs > --- > > Key: BEAM-11056 > URL: https://issues.apache.org/jira/browse/BEAM-11056 > Project: Beam > Issue Type: Bug > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 3.5h > Remaining Estimate: 0h > > When invoking `ib.evict_captured_data()`, the logging contains a typo > `recordeddata`: > `You have requested Interactive Beam to evict all recordeddata that could be > deterministically replayed among multiple pipeline runs.` > Also, the `capture_control` should be renamed to `record_control`. All > occurrences of `capture` should be changed to `record` to keep the > consistency of large source recording improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-11056) Fix warning message and rename old APIs
[ https://issues.apache.org/jira/browse/BEAM-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-11056 started by Ning Kang. > Fix warning message and rename old APIs > --- > > Key: BEAM-11056 > URL: https://issues.apache.org/jira/browse/BEAM-11056 > Project: Beam > Issue Type: Bug > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 3.5h > Remaining Estimate: 0h > > When invoking `ib.evict_captured_data()`, the logging contains a typo > `recordeddata`: > `You have requested Interactive Beam to evict all recordeddata that could be > deterministically replayed among multiple pipeline runs.` > Also, the `capture_control` should be renamed to `record_control`. All > occurrences of `capture` should be changed to `record` to keep the > consistency of large source recording improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11056) Fix warning message and rename old APIs
Ning Kang created BEAM-11056: Summary: Fix warning message and rename old APIs Key: BEAM-11056 URL: https://issues.apache.org/jira/browse/BEAM-11056 Project: Beam Issue Type: Bug Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang When invoking `ib.evict_captured_data()`, the logging contains a typo `recordeddata`: `You have requested Interactive Beam to evict all recordeddata that could be deterministically replayed among multiple pipeline runs.` Also, the `capture_control` should be renamed to `record_control`. All occurrences of `capture` should be changed to `record` to keep the consistency of large source recording improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11039) Resolve conflicts between TFMA and Facets imports
[ https://issues.apache.org/jira/browse/BEAM-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11039: - Status: Resolved (was: In Progress) > Resolve conflicts between TFMA and Facets imports > - > > Key: BEAM-11039 > URL: https://issues.apache.org/jira/browse/BEAM-11039 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > In a trusted notebook, Facets will load automatically, and declare extra > variables in the global scope > (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side > effects if other libraries try to declare the same variables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-11039) Resolve conflicts between TFMA and Facets imports
[ https://issues.apache.org/jira/browse/BEAM-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-11039 started by Ning Kang. > Resolve conflicts between TFMA and Facets imports > - > > Key: BEAM-11039 > URL: https://issues.apache.org/jira/browse/BEAM-11039 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > In a trusted notebook, Facets will load automatically, and declare extra > variables in the global scope > (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side > effects if other libraries try to declare the same variables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11045) Screen diff integration tests fail
Ning Kang created BEAM-11045: Summary: Screen diff integration tests fail Key: BEAM-11045 URL: https://issues.apache.org/jira/browse/BEAM-11045 Project: Beam Issue Type: Test Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang The tests still function with Python 3.7. Golden screenshots and chromedriver-binary need to be updated because the version advancement of newest chrome browser. The tests start failing with Python 3.8 due to the incompatibility in `Process` from `multiprocessing` module. A process is used to start an HTTP server to serve notebook results in headless browser for taking screenshots. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-11039) Resolve conflicts between TFMA and Facets imports
[ https://issues.apache.org/jira/browse/BEAM-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209894#comment-17209894 ] Ning Kang commented on BEAM-11039: -- I had a discussion with James from Facets side and decided to enclose the HTML templates in interactive Beam with iframe to avoid conflict with TFMA. This also resolves the mis-renderings in JupyterLab: * Weird shaped widgets when HTML import happens in the same notebook cell that renders the facets widgets. * Empty display except the first rendering when multiple cells render facets widgets if import HTML in each notebook output area. > Resolve conflicts between TFMA and Facets imports > - > > Key: BEAM-11039 > URL: https://issues.apache.org/jira/browse/BEAM-11039 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > In a trusted notebook, Facets will load automatically, and declare extra > variables in the global scope > (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side > effects if other libraries try to declare the same variables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11039) Resolve conflicts between TFMA and Facets imports
Ning Kang created BEAM-11039: Summary: Resolve conflicts between TFMA and Facets imports Key: BEAM-11039 URL: https://issues.apache.org/jira/browse/BEAM-11039 Project: Beam Issue Type: Test Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang In a trusted notebook, Facets will load automatically, and declare extra variables in the global scope (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side effects if other libraries try to declare the same variables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit
[ https://issues.apache.org/jira/browse/BEAM-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-11025: - Status: Resolved (was: In Progress) > PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in > precommit > - > > Key: BEAM-11025 > URL: https://issues.apache.org/jira/browse/BEAM-11025 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 0.5h > Remaining Estimate: 0h > > https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/ > Error Message > AssertionError: None is not an instance of > Stacktrace > self = > testMethod=test_dynamic_plotting_return_handle> > def test_dynamic_plotting_return_handle(self): > h = pv.visualize( > self._stream, dynamic_plotting_interval=1, display_facets=True) > > self.assertIsInstance(h, timeloop.Timeloop) > E AssertionError: None is not an instance of 'timeloop.app.Timeloop'> > apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: > AssertionError > Standard Output > > >0 > 0 0 > 1 1 > 2 2 > 3 3 > 4 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit
[ https://issues.apache.org/jira/browse/BEAM-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-11025 started by Ning Kang. > PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in > precommit > - > > Key: BEAM-11025 > URL: https://issues.apache.org/jira/browse/BEAM-11025 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 0.5h > Remaining Estimate: 0h > > https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/ > Error Message > AssertionError: None is not an instance of > Stacktrace > self = > testMethod=test_dynamic_plotting_return_handle> > def test_dynamic_plotting_return_handle(self): > h = pv.visualize( > self._stream, dynamic_plotting_interval=1, display_facets=True) > > self.assertIsInstance(h, timeloop.Timeloop) > E AssertionError: None is not an instance of 'timeloop.app.Timeloop'> > apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: > AssertionError > Standard Output > > >0 > 0 0 > 1 1 > 2 2 > 3 3 > 4 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit
[ https://issues.apache.org/jira/browse/BEAM-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209037#comment-17209037 ] Ning Kang commented on BEAM-11025: -- Based on the standard output, it looks like the None is returned by one of the failure paths when the environment is considered not in a notebook. There could be a race condition when tests are running in parallel and modifying the global instance ib.current_env(). In that case, a patch in the test to use a mocked `is_in_notebook` check should be added. > PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in > precommit > - > > Key: BEAM-11025 > URL: https://issues.apache.org/jira/browse/BEAM-11025 > Project: Beam > Issue Type: Test > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/ > Error Message > AssertionError: None is not an instance of > Stacktrace > self = > testMethod=test_dynamic_plotting_return_handle> > def test_dynamic_plotting_return_handle(self): > h = pv.visualize( > self._stream, dynamic_plotting_interval=1, display_facets=True) > > self.assertIsInstance(h, timeloop.Timeloop) > E AssertionError: None is not an instance of 'timeloop.app.Timeloop'> > apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: > AssertionError > Standard Output > > >0 > 0 0 > 1 1 > 2 2 > 3 3 > 4 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit
Ning Kang created BEAM-11025: Summary: PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit Key: BEAM-11025 URL: https://issues.apache.org/jira/browse/BEAM-11025 Project: Beam Issue Type: Test Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/ Error Message AssertionError: None is not an instance of Stacktrace self = def test_dynamic_plotting_return_handle(self): h = pv.visualize( self._stream, dynamic_plotting_interval=1, display_facets=True) > self.assertIsInstance(h, timeloop.Timeloop) E AssertionError: None is not an instance of apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: AssertionError Standard Output 0 0 0 1 1 2 2 3 3 4 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10996) AssertionError: (10, ) when writing TF Records
[ https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10996: - Status: Open (was: Triage Needed) > AssertionError: (10, ) when writing TF Records > --- > > Key: BEAM-10996 > URL: https://issues.apache.org/jira/browse/BEAM-10996 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.22.0 >Reporter: Valliappa Lakshmanan >Priority: P1 > Attachments: repro.py > > > This code snippet: > > def create_tfrecord(x): > size = np.array([2.0, 3.0]) > tfexample = tf.train.Example( > features=tf.train.Features( > feature={ > 'size': tf.train.Feature(float_list=tf.train.FloatList(value=size)) > })) > return tfexample.SerializeToString() > > ... > beam.FlatMap(lambda x: create_tfrecord(x)) > ... > > throws this error: > > Traceback (most recent call last): File "apache_beam/runners/common.py", line > 961, in apache_beam.runners.common.DoFnRunner.process File > "apache_beam/runners/common.py", line 726, in > apache_beam.runners.common.PerWindowInvoker.invoke_process File > "apache_beam/runners/common.py", line 814, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, > in process self.writer.write(element) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 420, in write self.sink.write_record(self.temp_handle, value) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 146, in write_record self.write_encoded_record(file_handle, > self.coder.encode(value)) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line > 463, in encode return self.get_impl().encode(value) File > "apache_beam/coders/coder_impl.py", line 494, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode File > "apache_beam/coders/coder_impl.py", line 495, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, > ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10996) AssertionError: (10, ) when writing TF Records
[ https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10996: - Status: Triage Needed (was: Open) > AssertionError: (10, ) when writing TF Records > --- > > Key: BEAM-10996 > URL: https://issues.apache.org/jira/browse/BEAM-10996 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.22.0 >Reporter: Valliappa Lakshmanan >Priority: P1 > Attachments: repro.py > > > This code snippet: > > def create_tfrecord(x): > size = np.array([2.0, 3.0]) > tfexample = tf.train.Example( > features=tf.train.Features( > feature={ > 'size': tf.train.Feature(float_list=tf.train.FloatList(value=size)) > })) > return tfexample.SerializeToString() > > ... > beam.FlatMap(lambda x: create_tfrecord(x)) > ... > > throws this error: > > Traceback (most recent call last): File "apache_beam/runners/common.py", line > 961, in apache_beam.runners.common.DoFnRunner.process File > "apache_beam/runners/common.py", line 726, in > apache_beam.runners.common.PerWindowInvoker.invoke_process File > "apache_beam/runners/common.py", line 814, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, > in process self.writer.write(element) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 420, in write self.sink.write_record(self.temp_handle, value) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 146, in write_record self.write_encoded_record(file_handle, > self.coder.encode(value)) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line > 463, in encode return self.get_impl().encode(value) File > "apache_beam/coders/coder_impl.py", line 494, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode File > "apache_beam/coders/coder_impl.py", line 495, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, > ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10996) AssertionError: (10, ) when writing TF Records
[ https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-10996: Assignee: (was: Ning Kang) > AssertionError: (10, ) when writing TF Records > --- > > Key: BEAM-10996 > URL: https://issues.apache.org/jira/browse/BEAM-10996 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.22.0 >Reporter: Valliappa Lakshmanan >Priority: P1 > Attachments: repro.py > > > This code snippet: > > def create_tfrecord(x): > size = np.array([2.0, 3.0]) > tfexample = tf.train.Example( > features=tf.train.Features( > feature={ > 'size': tf.train.Feature(float_list=tf.train.FloatList(value=size)) > })) > return tfexample.SerializeToString() > > ... > beam.FlatMap(lambda x: create_tfrecord(x)) > ... > > throws this error: > > Traceback (most recent call last): File "apache_beam/runners/common.py", line > 961, in apache_beam.runners.common.DoFnRunner.process File > "apache_beam/runners/common.py", line 726, in > apache_beam.runners.common.PerWindowInvoker.invoke_process File > "apache_beam/runners/common.py", line 814, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, > in process self.writer.write(element) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 420, in write self.sink.write_record(self.temp_handle, value) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 146, in write_record self.write_encoded_record(file_handle, > self.coder.encode(value)) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line > 463, in encode return self.get_impl().encode(value) File > "apache_beam/coders/coder_impl.py", line 494, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode File > "apache_beam/coders/coder_impl.py", line 495, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, > ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10996) AssertionError: (10, ) when writing TF Records
[ https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208270#comment-17208270 ] Ning Kang commented on BEAM-10996: -- First, I have to clarify this is not related to the notebook, the pipeline fails no matter where it runs. Second, there are 2 problems with the pipeline: When using `beam.FlatMap`, the 3 serialized strings are flattened into dozens of integers. Thus you have the assertion error because the elements associated with the PCollection are now of type `int`. You should use `beam.Map` instead of `beam.FlatMap`. If this is intended, you have to convert the int elements into strings such as appending another transform `beam.Map(lambda x: str(x))`.The SerializeToString is not decodable by Beam. You can enclose it with `return base64.b64encode(tfexample.SerializeToString()).decode('utf-8')`. > AssertionError: (10, ) when writing TF Records > --- > > Key: BEAM-10996 > URL: https://issues.apache.org/jira/browse/BEAM-10996 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.22.0 >Reporter: Valliappa Lakshmanan >Assignee: Ning Kang >Priority: P1 > Attachments: repro.py > > > This code snippet: > > def create_tfrecord(x): > size = np.array([2.0, 3.0]) > tfexample = tf.train.Example( > features=tf.train.Features( > feature={ > 'size': tf.train.Feature(float_list=tf.train.FloatList(value=size)) > })) > return tfexample.SerializeToString() > > ... > beam.FlatMap(lambda x: create_tfrecord(x)) > ... > > throws this error: > > Traceback (most recent call last): File "apache_beam/runners/common.py", line > 961, in apache_beam.runners.common.DoFnRunner.process File > "apache_beam/runners/common.py", line 726, in > apache_beam.runners.common.PerWindowInvoker.invoke_process File > "apache_beam/runners/common.py", line 814, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, > in process self.writer.write(element) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 420, in write self.sink.write_record(self.temp_handle, value) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 146, in write_record self.write_encoded_record(file_handle, > self.coder.encode(value)) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line > 463, in encode return self.get_impl().encode(value) File > "apache_beam/coders/coder_impl.py", line 494, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode File > "apache_beam/coders/coder_impl.py", line 495, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, > ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10996) AssertionError: (10, ) when writing TF Records
[ https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reassigned BEAM-10996: Assignee: Ning Kang > AssertionError: (10, ) when writing TF Records > --- > > Key: BEAM-10996 > URL: https://issues.apache.org/jira/browse/BEAM-10996 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.22.0 >Reporter: Valliappa Lakshmanan >Assignee: Ning Kang >Priority: P1 > Attachments: repro.py > > > This code snippet: > > def create_tfrecord(x): > size = np.array([2.0, 3.0]) > tfexample = tf.train.Example( > features=tf.train.Features( > feature={ > 'size': tf.train.Feature(float_list=tf.train.FloatList(value=size)) > })) > return tfexample.SerializeToString() > > ... > beam.FlatMap(lambda x: create_tfrecord(x)) > ... > > throws this error: > > Traceback (most recent call last): File "apache_beam/runners/common.py", line > 961, in apache_beam.runners.common.DoFnRunner.process File > "apache_beam/runners/common.py", line 726, in > apache_beam.runners.common.PerWindowInvoker.invoke_process File > "apache_beam/runners/common.py", line 814, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, > in process self.writer.write(element) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 420, in write self.sink.write_record(self.temp_handle, value) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", > line 146, in write_record self.write_encoded_record(file_handle, > self.coder.encode(value)) File > "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line > 463, in encode return self.get_impl().encode(value) File > "apache_beam/coders/coder_impl.py", line 494, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode File > "apache_beam/coders/coder_impl.py", line 495, in > apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, > ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10775) Create precomimt job for typescript tests
[ https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10775: - Status: Resolved (was: In Progress) > Create precomimt job for typescript tests > - > > Key: BEAM-10775 > URL: https://issues.apache.org/jira/browse/BEAM-10775 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 10h > Remaining Estimate: 0h > > Create a precommit job that runs jest tests and eslint checks for typescript > code in the Beam repo. > Currently known typescript code is: > the side panel extension - located under > sdks/python/apache_beam/runners/interactive/extensions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core
[ https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10827: - Status: Resolved (was: In Progress) > google-cloud-spanner is incompatible with google-cloud-core > --- > > Key: BEAM-10827 > URL: https://issues.apache.org/jira/browse/BEAM-10827 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > Running into error when building a Beam notebook container > {code:java} > ERROR: google-cloud-spanner 1.18.0 has requirement > google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 > which is incompatible. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core
[ https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10827: - Status: Resolved (was: Resolved) > google-cloud-spanner is incompatible with google-cloud-core > --- > > Key: BEAM-10827 > URL: https://issues.apache.org/jira/browse/BEAM-10827 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > Running into error when building a Beam notebook container > {code:java} > ERROR: google-cloud-spanner 1.18.0 has requirement > google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 > which is incompatible. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core
[ https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186740#comment-17186740 ] Ning Kang commented on BEAM-10827: -- [~pabloem] Yes. Marked it as resolved. > google-cloud-spanner is incompatible with google-cloud-core > --- > > Key: BEAM-10827 > URL: https://issues.apache.org/jira/browse/BEAM-10827 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > Running into error when building a Beam notebook container > {code:java} > ERROR: google-cloud-spanner 1.18.0 has requirement > google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 > which is incompatible. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core
[ https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10827 started by Ning Kang. > google-cloud-spanner is incompatible with google-cloud-core > --- > > Key: BEAM-10827 > URL: https://issues.apache.org/jira/browse/BEAM-10827 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > Running into error when building a Beam notebook container > {code:java} > ERROR: google-cloud-spanner 1.18.0 has requirement > google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 > which is incompatible. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and not have conflicting dependencies
[ https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186180#comment-17186180 ] Ning Kang commented on BEAM-8551: - The Python Docker precommit and cron job do not error out when there is a dependency conflict: https://screenshot.googleplex.com/AVSmsuBCBuEijQR The behavior might change after Oct., 2020. We'll check if we can add a pip check statement in the jobs. > Beam Python containers should include all Beam SDK dependencies, and not have > conflicting dependencies > -- > > Key: BEAM-8551 > URL: https://issues.apache.org/jira/browse/BEAM-8551 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: P1 > > Checks could be introduced during container creation, and be enforced by > ValidatesContainer test suites. We could: > - Check pip output or status code for incompatible dependency errors. > - Remove internet access when installing apache-beam in the container, to > makes sure all dependencies are installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core
Ning Kang created BEAM-10827: Summary: google-cloud-spanner is incompatible with google-cloud-core Key: BEAM-10827 URL: https://issues.apache.org/jira/browse/BEAM-10827 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Ning Kang Assignee: Ning Kang Running into error when building a Beam notebook container {code:java} ERROR: google-cloud-spanner 1.18.0 has requirement google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 which is incompatible. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo
[ https://issues.apache.org/jira/browse/BEAM-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185995#comment-17185995 ] Ning Kang commented on BEAM-10771: -- [~lcwik] Yes, it's done and [announced|https://lists.apache.org/thread.html/r5e8715301a2cafcd2aa1d972aa4ceb52247b6d7450cd8924fb6d1508%40%3Cdev.beam.apache.org%3E]. The [problem|https://github.com/apache/beam/pull/12689] we had yesterday was caused by forcefully merged code because the whitespacelint has caught the problem in the PR. > Create whitespacelint precommit for non code files in Beam repo > --- > > Key: BEAM-10771 > URL: https://issues.apache.org/jira/browse/BEAM-10771 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 3h 10m > Remaining Estimate: 0h > > Create a precommit job that runs on changes of non code files in Beam to find > out trailing whitespaces and report such issues. > Supported non code file types are: > * *.md (markdown files) > * build.gradle (gradle build files) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo
[ https://issues.apache.org/jira/browse/BEAM-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10771: - Status: Resolved (was: In Progress) > Create whitespacelint precommit for non code files in Beam repo > --- > > Key: BEAM-10771 > URL: https://issues.apache.org/jira/browse/BEAM-10771 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 3h 10m > Remaining Estimate: 0h > > Create a precommit job that runs on changes of non code files in Beam to find > out trailing whitespaces and report such issues. > Supported non code file types are: > * *.md (markdown files) > * build.gradle (gradle build files) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo
[ https://issues.apache.org/jira/browse/BEAM-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10771 started by Ning Kang. > Create whitespacelint precommit for non code files in Beam repo > --- > > Key: BEAM-10771 > URL: https://issues.apache.org/jira/browse/BEAM-10771 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 2h 10m > Remaining Estimate: 0h > > Create a precommit job that runs on changes of non code files in Beam to find > out trailing whitespaces and report such issues. > Supported non code file types are: > * *.md (markdown files) > * build.gradle (gradle build files) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-10764) Make is_in_ipython robust
[ https://issues.apache.org/jira/browse/BEAM-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10764 started by Ning Kang. > Make is_in_ipython robust > - > > Key: BEAM-10764 > URL: https://issues.apache.org/jira/browse/BEAM-10764 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > `is_in_ipython` determines if current code execution is within an IPython > environment by attempting to fetch an IPython kernel through > `IPython.get_ipython()`. > If IPython dependency is not available or a `None` is fetched, the result > would be False. > We've been seeing some users using corrupted IPython dependency in their code > base. > If an IPython dependency is present but throws a non ImportError exception, > it will break the Beam usage. > I assume the similar errors would happen if the user uses an IPython > dependency outside the range of versions in setup.py. > I decide to make the function best effort so that it always returns False > when errors occur. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10764) Make is_in_ipython robust
[ https://issues.apache.org/jira/browse/BEAM-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10764: - Status: Resolved (was: In Progress) > Make is_in_ipython robust > - > > Key: BEAM-10764 > URL: https://issues.apache.org/jira/browse/BEAM-10764 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > `is_in_ipython` determines if current code execution is within an IPython > environment by attempting to fetch an IPython kernel through > `IPython.get_ipython()`. > If IPython dependency is not available or a `None` is fetched, the result > would be False. > We've been seeing some users using corrupted IPython dependency in their code > base. > If an IPython dependency is present but throws a non ImportError exception, > it will break the Beam usage. > I assume the similar errors would happen if the user uses an IPython > dependency outside the range of versions in setup.py. > I decide to make the function best effort so that it always returns False > when errors occur. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10775) Create precomimt job for typescript tests
[ https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10775: - Description: Create a precommit job that runs jest tests and eslint checks for typescript code in the Beam repo. Currently known typescript code is: the side panel extension - located under sdks/python/apache_beam/runners/interactive/extensions was:Create a precommit job that runs jest tests for the side panel extension > Create precomimt job for typescript tests > - > > Key: BEAM-10775 > URL: https://issues.apache.org/jira/browse/BEAM-10775 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > Create a precommit job that runs jest tests and eslint checks for typescript > code in the Beam repo. > Currently known typescript code is: > the side panel extension - located under > sdks/python/apache_beam/runners/interactive/extensions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10775) Create precomimt job for typescript tests
[ https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10775: - Summary: Create precomimt job for typescript tests (was: Create precomimt job for jest tests of side panel extension) > Create precomimt job for typescript tests > - > > Key: BEAM-10775 > URL: https://issues.apache.org/jira/browse/BEAM-10775 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > Create a precommit job that runs jest tests for the side panel extension -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-10775) Create precomimt job for jest tests of side panel extension
[ https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10775 started by Ning Kang. > Create precomimt job for jest tests of side panel extension > --- > > Key: BEAM-10775 > URL: https://issues.apache.org/jira/browse/BEAM-10775 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > > Create a precommit job that runs jest tests for the side panel extension -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10775) Create precomimt job for jest tests of side panel extension
Ning Kang created BEAM-10775: Summary: Create precomimt job for jest tests of side panel extension Key: BEAM-10775 URL: https://issues.apache.org/jira/browse/BEAM-10775 Project: Beam Issue Type: Improvement Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang Create a precommit job that runs jest tests for the side panel extension -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo
Ning Kang created BEAM-10771: Summary: Create whitespacelint precommit for non code files in Beam repo Key: BEAM-10771 URL: https://issues.apache.org/jira/browse/BEAM-10771 Project: Beam Issue Type: Improvement Components: testing Reporter: Ning Kang Assignee: Ning Kang Create a precommit job that runs on changes of non code files in Beam to find out trailing whitespaces and report such issues. Supported non code file types are: * *.md (markdown files) * build.gradle (gradle build files) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10764) Make is_in_ipython robust
Ning Kang created BEAM-10764: Summary: Make is_in_ipython robust Key: BEAM-10764 URL: https://issues.apache.org/jira/browse/BEAM-10764 Project: Beam Issue Type: Improvement Components: runner-py-interactive Reporter: Ning Kang Assignee: Ning Kang `is_in_ipython` determines if current code execution is within an IPython environment by attempting to fetch an IPython kernel through `IPython.get_ipython()`. If IPython dependency is not available or a `None` is fetched, the result would be False. We've been seeing some users using corrupted IPython dependency in their code base. If an IPython dependency is present but throws a non ImportError exception, it will break the Beam usage. I assume the similar errors would happen if the user uses an IPython dependency outside the range of versions in setup.py. I decide to make the function best effort so that it always returns False when errors occur. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work stopped] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10637 stopped by Ning Kang. > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10637: - Status: Resolved (was: Open) > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-10637 started by Ning Kang. > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10637: - Status: Open (was: Triage Needed) > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang reopened BEAM-10637: -- Try to make the resolution resolved. > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10627) tests fails on windows - interactive tests fails due to FileNotFoundError
[ https://issues.apache.org/jira/browse/BEAM-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175728#comment-17175728 ] Ning Kang commented on BEAM-10627: -- [~TobKed] I don't think there is any different in term of these tests in Py3.5 and Py3.7. But many interactive features require Py3.6 and + to work. So it's okay to skip all these tests for Py3.5. > tests fails on windows - interactive tests fails due to FileNotFoundError > - > > Key: BEAM-10627 > URL: https://issues.apache.org/jira/browse/BEAM-10627 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Tobiasz Kedzierski >Assignee: Ning Kang >Priority: P2 > Attachments: BEAM-10627.txt > > > Failing tests: > apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest.test_basic > apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest.test_wordcount > apache_beam.runners.interactive.interactive_beam_test.InteractiveBeamTest.test_show_always_watch_given_pcolls > apache_beam.runners.interactive.interactive_beam_test.InteractiveBeamTest.test_show_mark_pcolls_computed_when_done > Link to the github workflow run with mentioned error: > [https://github.com/TobKed/beam/runs/937336438?check_suite_focus=true] > partial log: > 2020-08-02T11:05:43.5852779Z ___ > InteractiveBeamTest.test_show_always_watch_given_pcolls ___ > 2020-08-02T11:05:43.5853476Z [gw3] win32 -- Python 3.5.4 > d:\a\beam\beam\sdks\python\target\.tox\py35-win\scripts\python.exe > 2020-08-02T11:05:43.5853847Z > 2020-08-02T11:05:43.5855313Z self = > testMethod=test_show_always_watch_given_pcolls> > 2020-08-02T11:05:43.5855658Z > 2020-08-02T11:05:43.5855975Z def > test_show_always_watch_given_pcolls(self): > 2020-08-02T11:05:43.5856278Z p = beam.Pipeline(ir.InteractiveRunner()) > 2020-08-02T11:05:43.5856566Z # pylint: > disable=range-builtin-not-iterating > 2020-08-02T11:05:43.5856845Z pcoll = p | 'Create' >> > beam.Create(range(10)) > 2020-08-02T11:05:43.5857355Z # The pcoll is not watched since > watch(locals()) is not explicitly called. > 2020-08-02T11:05:43.5858106Z self.assertFalse(pcoll in > _get_watched_pcollections_with_variable_names()) > 2020-08-02T11:05:43.5858620Z # The call of show watches pcoll. > 2020-08-02T11:05:43.5859235Z > ib.show(pcoll) > 2020-08-02T11:05:43.5859475Z > 2020-08-02T11:05:43.5860015Z > apache_beam\runners\interactive\interactive_beam_test.py:96: > 2020-08-02T11:05:43.5861024Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > 2020-08-02T11:05:43.5861944Z apache_beam\runners\interactive\utils.py:205: in > run_within_progress_indicator > 2020-08-02T11:05:43.5862682Z return func(*args, **kwargs) > 2020-08-02T11:05:43.5863214Z > apache_beam\runners\interactive\interactive_beam.py:411: in show > 2020-08-02T11:05:43.5863760Z result = pf.PipelineFragment(list(pcolls), > user_pipeline.options).run() > 2020-08-02T11:05:43.5864291Z > apache_beam\runners\interactive\pipeline_fragment.py:113: in run > 2020-08-02T11:05:43.5864746Z return self.deduce_fragment().run() > 2020-08-02T11:05:43.5865292Z apache_beam\pipeline.py:521: in run > 2020-08-02T11:05:43.5865633Z allow_proto_holders=True).run(False) > 2020-08-02T11:05:43.5866159Z apache_beam\pipeline.py:534: in run > 2020-08-02T11:05:43.5866638Z return self.runner.run_pipeline(self, > self._options) > 2020-08-02T11:05:43.5867299Z > apache_beam\runners\interactive\interactive_runner.py:194: in run_pipeline > 2020-08-02T11:05:43.5867667Z pipeline_to_execute.run(), > pipeline_instrument) > 2020-08-02T11:05:43.5868119Z apache_beam\pipeline.py:534: in run > 2020-08-02T11:05:43.5868627Z return self.runner.run_pipeline(self, > self._options) > 2020-08-02T11:05:43.5869401Z apache_beam\runners\direct\direct_runner.py:119: > in run_pipeline > 2020-08-02T11:05:43.5869735Z return runner.run_pipeline(pipeline, options) > 2020-08-02T11:05:43.5870201Z > apache_beam\runners\portability\fn_api_runner\fn_runner.py:176: in > run_pipeline > 2020-08-02T11:05:43.5870665Z > pipeline.to_runner_api(default_environment=self._default_environment)) > 2020-08-02T11:05:43.5871520Z > apache_beam\runners\portability\fn_api_runner\fn_runner.py:186: in > run_via_runner_api > 2020-08-02T11:05:43.5871987Z return self.run_stages(stage_context, stages) > 2020-08-02T11:05:43.5872612Z > apache_beam\runners\portability\fn_api_runner\fn_runner.py:344: in run_stages > 2020-08-02T11:05:43.5872918Z bundle_context_manager, > 2020-08-02T11:05:43.5873512Z > apache_beam\runners\portability\fn_api_runner\fn_runner.py:523: in _run_stage > 2020-08-02T11:05:43.5873851Z bundle_mana
[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10637: - Status: Resolved (was: Resolved) > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller
[ https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10637: - Status: Resolved (was: Open) > Fix start/stop hanging issue on test stream service controller > -- > > Key: BEAM-10637 > URL: https://issues.apache.org/jira/browse/BEAM-10637 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > The test stream controller sometimes hangs forever when the underlying grpc > server start or stop in the wrong timing: > E.g., try to stop a server never started, stop a server already stopped, > start a server already started and etc. > The change is to add start/stop states to handle all 6 combinations: > # start server not started > # start server started > # start server stopped > # stop server not started > # stop server started > # stop server stopped > So that it never hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10635) Incompatible google-api-core and google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10635: - Status: Resolved (was: Resolved) > Incompatible google-api-core and google-cloud-bigquery > -- > > Key: BEAM-10635 > URL: https://issues.apache.org/jira/browse/BEAM-10635 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P0 > Time Spent: 40m > Remaining Estimate: 0h > > This bigquery version bump: > https://github.com/apache/beam/commit/a315672dbe2b82e013e8120ac5398c623c7b13a7 > seems to have broken the pip check: > google-cloud-bigquery 1.26.1 has requirement google-api-core<2.0dev,>=1.21.0, > but you have google-api-core 1.20.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10635) Incompatible google-api-core and google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Kang updated BEAM-10635: - Status: Resolved (was: Triage Needed) > Incompatible google-api-core and google-cloud-bigquery > -- > > Key: BEAM-10635 > URL: https://issues.apache.org/jira/browse/BEAM-10635 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P0 > Time Spent: 40m > Remaining Estimate: 0h > > This bigquery version bump: > https://github.com/apache/beam/commit/a315672dbe2b82e013e8120ac5398c623c7b13a7 > seems to have broken the pip check: > google-cloud-bigquery 1.26.1 has requirement google-api-core<2.0dev,>=1.21.0, > but you have google-api-core 1.20.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)