[jira] [Created] (BEAM-12506) Change WindowedValueHolder into a Row Schema

2021-06-17 Thread Ning Kang (Jira)
Ning Kang created BEAM-12506:


 Summary: Change WindowedValueHolder into a Row Schema
 Key: BEAM-12506
 URL: https://issues.apache.org/jira/browse/BEAM-12506
 Project: Beam
  Issue Type: Improvement
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


The WindowedValueHolder is a Python type that requires a special 
`SafeFastPrimitivesCoder` instead of the native `FastPrimitivesCoder` in 
cache_manager to encode and decode.

When reading cache of it and applying an external transform such as a 
SqlTransform, it introduces a pickled Python coder that is not xLang friendly.

We could build a Row schema to hold the WindowedValueHolder to make the cache 
reading xLang friendly and also get rid of the additional layer of 
`SafeFastPrimitivesCoder`.

We could 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-06-02 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10708 started by Ning Kang.

> InteractiveRunner cannot execute pipeline with cross-language transform
> ---
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-06-02 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-10708:


Assignee: Ning Kang

> InteractiveRunner cannot execute pipeline with cross-language transform
> ---
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-12391) WriteToAvro fails if fastavro loads its python implementation of writer

2021-05-27 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352650#comment-17352650
 ] 

Ning Kang edited comment on BEAM-12391 at 5/27/21, 5:27 PM:


Beam notebooks have mitigated the issue by setting an environment variable 
{code:json}
{"env": {"LD_LIBRARY_PATH":"/opt/conda/lib"}}
{code}
 in each IPython kernel.


was (Author: ningk):
Beam notebooks have mitigated the issue by setting an environment variable 
`"env": {"LD_LIBRARY_PATH":"/opt/conda/lib"}` in each IPython kernel.

> WriteToAvro fails if fastavro loads its python implementation of writer
> ---
>
> Key: BEAM-12391
> URL: https://issues.apache.org/jira/browse/BEAM-12391
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-avro
>Affects Versions: 2.25.0, 2.26.0, 2.27.0, 2.28.0, 2.29.0
>Reporter: Chris Chandler
>Priority: P2
>
> It's possible for fastavro to fail to correctly load its cython 
> implementation of the Writer class in which case it will silently fall back 
> to a pure python implementation. If this happens there's no outward 
> indication, but line 621 in io/avroio.py will fail because writer.fo is only 
> present on the cython implementation.
> To reproduce you can modify fastavro's write.py to only use its fallback:
> {code}
> #from . import _write
> from . import _write_py as _write
> {code}
> And then run a workflow that sinks to WriteToAvro(use_fastavro=True).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-12391) WriteToAvro fails if fastavro loads its python implementation of writer

2021-05-27 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352650#comment-17352650
 ] 

Ning Kang commented on BEAM-12391:
--

Beam notebooks have mitigated the issue by setting an environment variable 
`"env": {"LD_LIBRARY_PATH":"/opt/conda/lib"}` in each IPython kernel.

> WriteToAvro fails if fastavro loads its python implementation of writer
> ---
>
> Key: BEAM-12391
> URL: https://issues.apache.org/jira/browse/BEAM-12391
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-avro
>Affects Versions: 2.25.0, 2.26.0, 2.27.0, 2.28.0, 2.29.0
>Reporter: Chris Chandler
>Priority: P2
>
> It's possible for fastavro to fail to correctly load its cython 
> implementation of the Writer class in which case it will silently fall back 
> to a pure python implementation. If this happens there's no outward 
> indication, but line 621 in io/avroio.py will fail because writer.fo is only 
> present on the cython implementation.
> To reproduce you can modify fastavro's write.py to only use its fallback:
> {code}
> #from . import _write
> from . import _write_py as _write
> {code}
> And then run a workflow that sinks to WriteToAvro(use_fastavro=True).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-12391) WriteToAvro fails if fastavro loads its python implementation of writer

2021-05-24 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350620#comment-17350620
 ] 

Ning Kang commented on BEAM-12391:
--

Thanks for letting me know, I've taken the external issue.

> WriteToAvro fails if fastavro loads its python implementation of writer
> ---
>
> Key: BEAM-12391
> URL: https://issues.apache.org/jira/browse/BEAM-12391
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-avro
>Affects Versions: 2.25.0, 2.26.0, 2.27.0, 2.28.0, 2.29.0
>Reporter: Chris Chandler
>Priority: P2
>
> It's possible for fastavro to fail to correctly load its cython 
> implementation of the Writer class in which case it will silently fall back 
> to a pure python implementation. If this happens there's no outward 
> indication, but line 621 in io/avroio.py will fail because writer.fo is only 
> present on the cython implementation.
> To reproduce you can modify fastavro's write.py to only use its fallback:
> {code}
> #from . import _write
> from . import _write_py as _write
> {code}
> And then run a workflow that sinks to WriteToAvro(use_fastavro=True).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-05-18 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347084#comment-17347084
 ] 

Ning Kang commented on BEAM-10708:
--

[~chamikara] Thanks for asking, the first PR#14368 that initiated a new 
augmented_pipeline module to replace pipeline_instrument has been merged. I'll 
be working on this after finishing some other higher priority tasks. 
There are some known users who depend on the old modules that uses runner api 
roundtrips, I'll work with them to roll out the new module. We might need to 
keep the old modules as deprecated for a while.

> InteractiveRunner cannot execute pipeline with cross-language transform
> ---
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Brian Hulette
>Priority: P2
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-12096) Flake: test_progress_in_HTML_JS_when_in_notebook

2021-04-19 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-12096:
-
Fix Version/s: Not applicable
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Flake: test_progress_in_HTML_JS_when_in_notebook
> 
>
> Key: BEAM-12096
> URL: https://issues.apache.org/jira/browse/BEAM-12096
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Ning Kang
>Priority: P1
>  Labels: flake
> Fix For: Not applicable
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> * 
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/
> self =  testMethod=test_progress_in_HTML_JS_when_in_notebook> def 
> test_progress_in_HTML_JS_when_in_notebook(self): 
> ie.current_env()._is_in_notebook = True pi_path = 
> 'apache_beam.runners.interactive.utils.ProgressIndicator' with 
> patch('IPython.core.display.HTML') as mocked_html, \ 
> patch('IPython.core.display.Javascript') as mocked_javascript, \ 
> patch(pi_path + '.spinner_template') as enter_template, \ patch(pi_path + 
> '.spinner_removal_template') as exit_template: 
> enter_template.format.return_value = 'enter' 
> exit_template.format.return_value = 'exit' @utils.progress_indicated def 
> progress_indicated_dummy(): mocked_html.assert_any_call('enter') > 
> progress_indicated_dummy() apache_beam/runners/interactive/utils_test.py:217: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ apache_beam/runners/interactive/utils.py:235: in 
> run_within_progress_indicator return func(*args, **kwargs) 
> apache_beam/runners/interactive/utils_test.py:215: in 
> progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = 
> , args = ('enter',) kwargs = {}, 
> expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...]
> >You have limited Interactive Beam features since your ipython kernel is not 
> >connected any notebook frontend.')] cause = None, expected_string = 
> >"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the 
> >mock has been called with the specified arguments. The assert passes if the 
> >mock has *ever* been called, unlike `assert_called_with` and 
> >`assert_called_once_with` that only pass if the call is the most recent 
> >one.""" expected = self._call_matcher((args, kwargs)) actual = 
> >[self._call_matcher(c) for c in self.call_args_list] if expected not in 
> >actual: cause = expected if isinstance(expected, Exception) else None 
> >expected_string = self._format_mock_call_signature(args, kwargs) raise 
> >AssertionError( '%s call not found' % expected_string > ) from cause E 
> >AssertionError: HTML('enter') call not found 
> >/usr/lib/python3.7/unittest/mock.py:891: AssertionError



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-12165) ParquetIO sink should allow to pass an Avro data model

2021-04-19 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-12165:
-
Issue Type: Bug  (was: Improvement)

> ParquetIO sink should allow to pass an Avro data model
> --
>
> Key: BEAM-12165
> URL: https://issues.apache.org/jira/browse/BEAM-12165
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-parquet
>Reporter: Ning Kang
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> AvroParquetWriter instantiated in ParquetIO [1] does not specify the data 
> model.
> The default is SpecificData model [2], while the AvroParquetReader is reading 
> with a GenericData model [3].
> ParquetIO should pass in the correct data model.
> [1] 
> https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052
> [2] 
> https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163
> [3] 
> https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10901) Flaky test: PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection

2021-04-16 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10901:
-
Resolution: Fixed
Status: Resolved  (was: Open)

Mark this as a duplicate.

> Flaky test: 
> PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
> ---
>
> Key: BEAM-10901
> URL: https://issues.apache.org/jira/browse/BEAM-10901
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Ning Kang
>Priority: P1
>  Labels: flake, flaky-test
>
> https://pipelines.actions.githubusercontent.com/COKSSHEuHyOLM1pP93G8aBnLpu5tMJfYiveSzILlm0cIfDvSEl/_apis/pipelines/1/runs/6628/signedlogcontent/15?urlExpires=2020-09-15T18%3A07%3A33.5773511Z&urlSigningMethod=HMACV1&urlSignature=OWYnI0Ba0e%2FDh2cPSD7%2BFoYc6dp9%2F%2BLJwAMGszdaO1M%3D
> {{2020-09-15T17:47:23.9124315Z _ 
> PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
>  _
> 2020-09-15T17:47:23.9127327Z [gw2] win32 -- Python 3.5.4 
> d:\a\beam\beam\sdks\python\target\.tox\py35-win\scripts\python.exe
> 2020-09-15T17:47:23.9128518Z 
> 2020-09-15T17:47:23.9134948Z self = 
>   testMethod=test_able_to_cache_intermediate_unbounded_source_pcollection>
> 2020-09-15T17:47:23.9136122Z 
> 2020-09-15T17:47:23.9136585Z def setUp(self):
> 2020-09-15T17:47:23.9137012Z > ie.new_env()
> 2020-09-15T17:47:23.9137348Z 
> 2020-09-15T17:47:23.9138982Z 
> apache_beam\runners\interactive\pipeline_instrument_test.py:46: 
> 2020-09-15T17:47:23.9139662Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 2020-09-15T17:47:23.9141259Z 
> apache_beam\runners\interactive\interactive_environment.py:117: in new_env
> 2020-09-15T17:47:23.9141999Z _interactive_beam_env.cleanup()
> 2020-09-15T17:47:23.9142755Z 
> apache_beam\runners\interactive\interactive_environment.py:273: in cleanup
> 2020-09-15T17:47:23.9144712Z cache_manager.cleanup()
> 2020-09-15T17:47:23.9145453Z 
> apache_beam\runners\interactive\caching\streaming_cache.py:391: in cleanup
> 2020-09-15T17:47:23.9147098Z shutil.rmtree(self._cache_dir)
> 2020-09-15T17:47:23.9148525Z 
> c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:494: in rmtree
> 2020-09-15T17:47:23.9149356Z return _rmtree_unsafe(path, onerror)
> 2020-09-15T17:47:23.9150431Z 
> c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:384: in 
> _rmtree_unsafe
> 2020-09-15T17:47:23.9151289Z _rmtree_unsafe(fullname, onerror)
> 2020-09-15T17:47:23.9152010Z 
> c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:389: in 
> _rmtree_unsafe
> 2020-09-15T17:47:23.9152756Z onerror(os.unlink, fullname, sys.exc_info())
> 2020-09-15T17:47:23.9153286Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 2020-09-15T17:47:23.9153756Z 
> 2020-09-15T17:47:23.9155669Z path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py35-win\\tmp\\it-87ky_iwj2483994137376\\full'
> 2020-09-15T17:47:23.9156453Z onerror = .onerror at 
> 0x024259F42268>
> 2020-09-15T17:47:23.9156951Z 
> 2020-09-15T17:47:23.9157456Z def _rmtree_unsafe(path, onerror):
> 2020-09-15T17:47:23.9157949Z try:
> 2020-09-15T17:47:23.9158409Z if os.path.islink(path):
> 2020-09-15T17:47:23.9159821Z # symlinks to directories are 
> forbidden, see bug #1669
> 2020-09-15T17:47:23.9161488Z raise OSError("Cannot call 
> rmtree on a symbolic link")
> 2020-09-15T17:47:23.9162128Z except OSError:
> 2020-09-15T17:47:23.9162713Z onerror(os.path.islink, path, 
> sys.exc_info())
> 2020-09-15T17:47:23.9163363Z # can't continue even if onerror 
> hook returns
> 2020-09-15T17:47:23.9163870Z return
> 2020-09-15T17:47:23.9164305Z names = []
> 2020-09-15T17:47:23.9164715Z try:
> 2020-09-15T17:47:23.9165740Z names = os.listdir(path)
> 2020-09-15T17:47:23.9166181Z except OSError:
> 2020-09-15T17:47:23.9166737Z onerror(os.listdir, path, 
> sys.exc_info())
> 2020-09-15T17:47:23.9167277Z for name in names:
> 2020-09-15T17:47:23.9167796Z fullname = os.path.join(path, name)
> 2020-09-15T17:47:23.9168270Z try:
> 2020-09-15T17:47:23.9168993Z mode = os.lstat(fullname).st_mode
> 2020-09-15T17:47:23.9169511Z except OSError:
> 2020-09-15T17:47:23.9169942Z mode = 0
> 2020-09-15T17:47:23.9170392Z if stat.S_ISDIR(mode):
> 2020-09-15T17:47:23.9170991Z _rmtree_unsafe(fullname, onerror)
> 2020-09-15T17:47:23.9171493Z else:
> 2020-09-15T17:47:23.9171895Z try:
> 2020-09-15T17:47:23.9172276Z

[jira] [Commented] (BEAM-10901) Flaky test: PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection

2021-04-16 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324133#comment-17324133
 ] 

Ning Kang commented on BEAM-10901:
--

[~bhulette] Yes, the root cause should be the same. Let's mark it a duplicate.

> Flaky test: 
> PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
> ---
>
> Key: BEAM-10901
> URL: https://issues.apache.org/jira/browse/BEAM-10901
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Ning Kang
>Priority: P1
>  Labels: flake, flaky-test
>
> https://pipelines.actions.githubusercontent.com/COKSSHEuHyOLM1pP93G8aBnLpu5tMJfYiveSzILlm0cIfDvSEl/_apis/pipelines/1/runs/6628/signedlogcontent/15?urlExpires=2020-09-15T18%3A07%3A33.5773511Z&urlSigningMethod=HMACV1&urlSignature=OWYnI0Ba0e%2FDh2cPSD7%2BFoYc6dp9%2F%2BLJwAMGszdaO1M%3D
> {{2020-09-15T17:47:23.9124315Z _ 
> PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
>  _
> 2020-09-15T17:47:23.9127327Z [gw2] win32 -- Python 3.5.4 
> d:\a\beam\beam\sdks\python\target\.tox\py35-win\scripts\python.exe
> 2020-09-15T17:47:23.9128518Z 
> 2020-09-15T17:47:23.9134948Z self = 
>   testMethod=test_able_to_cache_intermediate_unbounded_source_pcollection>
> 2020-09-15T17:47:23.9136122Z 
> 2020-09-15T17:47:23.9136585Z def setUp(self):
> 2020-09-15T17:47:23.9137012Z > ie.new_env()
> 2020-09-15T17:47:23.9137348Z 
> 2020-09-15T17:47:23.9138982Z 
> apache_beam\runners\interactive\pipeline_instrument_test.py:46: 
> 2020-09-15T17:47:23.9139662Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 2020-09-15T17:47:23.9141259Z 
> apache_beam\runners\interactive\interactive_environment.py:117: in new_env
> 2020-09-15T17:47:23.9141999Z _interactive_beam_env.cleanup()
> 2020-09-15T17:47:23.9142755Z 
> apache_beam\runners\interactive\interactive_environment.py:273: in cleanup
> 2020-09-15T17:47:23.9144712Z cache_manager.cleanup()
> 2020-09-15T17:47:23.9145453Z 
> apache_beam\runners\interactive\caching\streaming_cache.py:391: in cleanup
> 2020-09-15T17:47:23.9147098Z shutil.rmtree(self._cache_dir)
> 2020-09-15T17:47:23.9148525Z 
> c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:494: in rmtree
> 2020-09-15T17:47:23.9149356Z return _rmtree_unsafe(path, onerror)
> 2020-09-15T17:47:23.9150431Z 
> c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:384: in 
> _rmtree_unsafe
> 2020-09-15T17:47:23.9151289Z _rmtree_unsafe(fullname, onerror)
> 2020-09-15T17:47:23.9152010Z 
> c:\hostedtoolcache\windows\python\3.5.4\x64\lib\shutil.py:389: in 
> _rmtree_unsafe
> 2020-09-15T17:47:23.9152756Z onerror(os.unlink, fullname, sys.exc_info())
> 2020-09-15T17:47:23.9153286Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 2020-09-15T17:47:23.9153756Z 
> 2020-09-15T17:47:23.9155669Z path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py35-win\\tmp\\it-87ky_iwj2483994137376\\full'
> 2020-09-15T17:47:23.9156453Z onerror = .onerror at 
> 0x024259F42268>
> 2020-09-15T17:47:23.9156951Z 
> 2020-09-15T17:47:23.9157456Z def _rmtree_unsafe(path, onerror):
> 2020-09-15T17:47:23.9157949Z try:
> 2020-09-15T17:47:23.9158409Z if os.path.islink(path):
> 2020-09-15T17:47:23.9159821Z # symlinks to directories are 
> forbidden, see bug #1669
> 2020-09-15T17:47:23.9161488Z raise OSError("Cannot call 
> rmtree on a symbolic link")
> 2020-09-15T17:47:23.9162128Z except OSError:
> 2020-09-15T17:47:23.9162713Z onerror(os.path.islink, path, 
> sys.exc_info())
> 2020-09-15T17:47:23.9163363Z # can't continue even if onerror 
> hook returns
> 2020-09-15T17:47:23.9163870Z return
> 2020-09-15T17:47:23.9164305Z names = []
> 2020-09-15T17:47:23.9164715Z try:
> 2020-09-15T17:47:23.9165740Z names = os.listdir(path)
> 2020-09-15T17:47:23.9166181Z except OSError:
> 2020-09-15T17:47:23.9166737Z onerror(os.listdir, path, 
> sys.exc_info())
> 2020-09-15T17:47:23.9167277Z for name in names:
> 2020-09-15T17:47:23.9167796Z fullname = os.path.join(path, name)
> 2020-09-15T17:47:23.9168270Z try:
> 2020-09-15T17:47:23.9168993Z mode = os.lstat(fullname).st_mode
> 2020-09-15T17:47:23.9169511Z except OSError:
> 2020-09-15T17:47:23.9169942Z mode = 0
> 2020-09-15T17:47:23.9170392Z if stat.S_ISDIR(mode):
> 2020-09-15T17:47:23.9170991Z _rmtree_unsafe(fullname, onerror)
> 2020-09-15T17:47:23.9171493Z else:
> 2020-09-15T17:47:23.9171895Z  

[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows

2021-04-16 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324074#comment-17324074
 ] 

Ning Kang commented on BEAM-12178:
--

PR#14558 merged. Let's keep the ticket open for a while and see if this issue 
re-occurs.

> ReadCacheTest flakes on Windows
> ---
>
> Key: BEAM-12178
> URL: https://issues.apache.org/jira/browse/BEAM-12178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing, flake
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Example failure: 
> https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757
> {code}
>  ReadCacheTest.test_read_cache 
> 
> [gw4] win32 -- Python 3.6.8 
> d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe
> self =  testMethod=test_read_cache>
> def setUp(self):
> > ie.new_env()
> apache_beam\runners\interactive\caching\read_cache_test.py:37: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> apache_beam\runners\interactive\interactive_environment.py:119: in new_env
> _interactive_beam_env.cleanup()
> apache_beam\runners\interactive\interactive_environment.py:269: in cleanup
> self.evict_recording_manager(pipeline)
> apache_beam\runners\interactive\interactive_environment.py:396: in 
> evict_recording_manager
> rm.clear()
> apache_beam\runners\interactive\recording_manager.py:339: in clear
> cache_manager.cleanup()
> apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup
> shutil.rmtree(self._cache_dir)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
> return _rmtree_unsafe(path, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in 
> _rmtree_unsafe
> _rmtree_unsafe(fullname, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in 
> _rmtree_unsafe
> onerror(os.unlink, fullname, sys.exc_info())
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full'
> onerror = .onerror at 0x0244BA770A60>
> def _rmtree_unsafe(path, onerror):
> try:
> if os.path.islink(path):
> # symlinks to directories are forbidden, see bug #1669
> raise OSError("Cannot call rmtree on a symbolic link")
> except OSError:
> onerror(os.path.islink, path, sys.exc_info())
> # can't continue even if onerror hook returns
> return
> names = []
> try:
> names = os.listdir(path)
> except OSError:
> onerror(os.listdir, path, sys.exc_info())
> for name in names:
> fullname = os.path.join(path, name)
> try:
> mode = os.lstat(fullname).st_mode
> except OSError:
> mode = 0
> if stat.S_ISDIR(mode):
> _rmtree_unsafe(fullname, onerror)
> else:
> try:
> >   os.unlink(fullname)
> E   PermissionError: [WinError 32] The process cannot access 
> the file because it is being used by another process: 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-12179) PortableRunnerTestWithExternalEnv failing

2021-04-15 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-12179:
-
Description: 
The test seems to be failing on Ubuntu platform for multiple PRs.  An 
[example|https://github.com/apache/beam/pull/14558/checks?check_run_id=2358281557].

A similar issue in the past is BEAM-8646.

{code}
__ PortableRunnerTestWithExternalEnv.test_assert_that __
[gw1] linux -- Python 3.8.9 
/home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python

self = 


def test_assert_that(self):
  # TODO: figure out a way for fn_api_runner to parse and raise the
  # underlying exception.
  with self.assertRaisesRegex(Exception, 'Failed assert'):
with self.create_pipeline() as p:
> assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
E AssertionError: "Failed assert" does not match 
"<_MultiThreadedRendezvous of RPC that terminated with:
E   status = StatusCode.DEADLINE_EXCEEDED
E   details = "Deadline Exceeded"
E   debug_error_string = 
"{"created":"@1618536797.894329055","description":"Error received from peer 
ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline
 Exceeded","grpc_status":4}"
E >"

apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: 
AssertionError
{code}


  was:
The test seems to be failing on Ubuntu platform for multiple PRs. 

A similar issue in the past is BEAM-8646.

{code}
__ PortableRunnerTestWithExternalEnv.test_assert_that __
[gw1] linux -- Python 3.8.9 
/home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python

self = 


def test_assert_that(self):
  # TODO: figure out a way for fn_api_runner to parse and raise the
  # underlying exception.
  with self.assertRaisesRegex(Exception, 'Failed assert'):
with self.create_pipeline() as p:
> assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
E AssertionError: "Failed assert" does not match 
"<_MultiThreadedRendezvous of RPC that terminated with:
E   status = StatusCode.DEADLINE_EXCEEDED
E   details = "Deadline Exceeded"
E   debug_error_string = 
"{"created":"@1618536797.894329055","description":"Error received from peer 
ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline
 Exceeded","grpc_status":4}"
E >"

apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: 
AssertionError
{code}



> PortableRunnerTestWithExternalEnv failing
> -
>
> Key: BEAM-12179
> URL: https://issues.apache.org/jira/browse/BEAM-12179
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures
>Reporter: Ning Kang
>Priority: P2
>
> The test seems to be failing on Ubuntu platform for multiple PRs.  An 
> [example|https://github.com/apache/beam/pull/14558/checks?check_run_id=2358281557].
> A similar issue in the past is BEAM-8646.
> {code}
> __ PortableRunnerTestWithExternalEnv.test_assert_that 
> __
> [gw1] linux -- Python 3.8.9 
> /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python
> self = 
>   testMethod=test_assert_that>
> def test_assert_that(self):
>   # TODO: figure out a way for fn_api_runner to parse and raise the
>   # underlying exception.
>   with self.assertRaisesRegex(Exception, 'Failed assert'):
> with self.create_pipeline() as p:
> > assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
> E AssertionError: "Failed assert" does not match 
> "<_MultiThreadedRendezvous of RPC that terminated with:
> E status = StatusCode.DEADLINE_EXCEEDED
> E details = "Deadline Exceeded"
> E debug_error_string = 
> "{"created":"@1618536797.894329055","description":"Error received from peer 
> ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline
>  Exceeded","grpc_status":4}"
> E >"
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: 
> AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-12179) PortableRunnerTestWithExternalEnv failing

2021-04-15 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-12179:
-
Issue Type: Bug  (was: Improvement)

> PortableRunnerTestWithExternalEnv failing
> -
>
> Key: BEAM-12179
> URL: https://issues.apache.org/jira/browse/BEAM-12179
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ning Kang
>Priority: P2
>
> The test seems to be failing on Ubuntu platform for multiple PRs.  An 
> [example|https://github.com/apache/beam/pull/14558/checks?check_run_id=2358281557].
> A similar issue in the past is BEAM-8646.
> {code}
> __ PortableRunnerTestWithExternalEnv.test_assert_that 
> __
> [gw1] linux -- Python 3.8.9 
> /home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python
> self = 
>   testMethod=test_assert_that>
> def test_assert_that(self):
>   # TODO: figure out a way for fn_api_runner to parse and raise the
>   # underlying exception.
>   with self.assertRaisesRegex(Exception, 'Failed assert'):
> with self.create_pipeline() as p:
> > assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
> E AssertionError: "Failed assert" does not match 
> "<_MultiThreadedRendezvous of RPC that terminated with:
> E status = StatusCode.DEADLINE_EXCEEDED
> E details = "Deadline Exceeded"
> E debug_error_string = 
> "{"created":"@1618536797.894329055","description":"Error received from peer 
> ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline
>  Exceeded","grpc_status":4}"
> E >"
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: 
> AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-12179) PortableRunnerTestWithExternalEnv failing

2021-04-15 Thread Ning Kang (Jira)
Ning Kang created BEAM-12179:


 Summary: PortableRunnerTestWithExternalEnv failing
 Key: BEAM-12179
 URL: https://issues.apache.org/jira/browse/BEAM-12179
 Project: Beam
  Issue Type: Improvement
  Components: test-failures
Reporter: Ning Kang


The test seems to be failing on Ubuntu platform for multiple PRs. 

A similar issue in the past is BEAM-8646.

{code}
__ PortableRunnerTestWithExternalEnv.test_assert_that __
[gw1] linux -- Python 3.8.9 
/home/runner/work/beam/beam/sdks/python/target/.tox/py38/bin/python

self = 


def test_assert_that(self):
  # TODO: figure out a way for fn_api_runner to parse and raise the
  # underlying exception.
  with self.assertRaisesRegex(Exception, 'Failed assert'):
with self.create_pipeline() as p:
> assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
E AssertionError: "Failed assert" does not match 
"<_MultiThreadedRendezvous of RPC that terminated with:
E   status = StatusCode.DEADLINE_EXCEEDED
E   details = "Deadline Exceeded"
E   debug_error_string = 
"{"created":"@1618536797.894329055","description":"Error received from peer 
ipv6:[::1]:43041","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Deadline
 Exceeded","grpc_status":4}"
E >"

apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:110: 
AssertionError
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows

2021-04-15 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322529#comment-17322529
 ] 

Ning Kang commented on BEAM-12178:
--

I searched the whole code base, there are only 3 occurrences of 
`ie.current_env().set_recording_manager`: 
https://github.com/apache/beam/search?q=set_recording_manager

One of them started a background recording job and added that recording manager 
to the global instance ie.current_env().
I'll remove that occurrence.

> ReadCacheTest flakes on Windows
> ---
>
> Key: BEAM-12178
> URL: https://issues.apache.org/jira/browse/BEAM-12178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing, flake
>
> Example failure: 
> https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757
> {code}
>  ReadCacheTest.test_read_cache 
> 
> [gw4] win32 -- Python 3.6.8 
> d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe
> self =  testMethod=test_read_cache>
> def setUp(self):
> > ie.new_env()
> apache_beam\runners\interactive\caching\read_cache_test.py:37: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> apache_beam\runners\interactive\interactive_environment.py:119: in new_env
> _interactive_beam_env.cleanup()
> apache_beam\runners\interactive\interactive_environment.py:269: in cleanup
> self.evict_recording_manager(pipeline)
> apache_beam\runners\interactive\interactive_environment.py:396: in 
> evict_recording_manager
> rm.clear()
> apache_beam\runners\interactive\recording_manager.py:339: in clear
> cache_manager.cleanup()
> apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup
> shutil.rmtree(self._cache_dir)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
> return _rmtree_unsafe(path, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in 
> _rmtree_unsafe
> _rmtree_unsafe(fullname, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in 
> _rmtree_unsafe
> onerror(os.unlink, fullname, sys.exc_info())
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full'
> onerror = .onerror at 0x0244BA770A60>
> def _rmtree_unsafe(path, onerror):
> try:
> if os.path.islink(path):
> # symlinks to directories are forbidden, see bug #1669
> raise OSError("Cannot call rmtree on a symbolic link")
> except OSError:
> onerror(os.path.islink, path, sys.exc_info())
> # can't continue even if onerror hook returns
> return
> names = []
> try:
> names = os.listdir(path)
> except OSError:
> onerror(os.listdir, path, sys.exc_info())
> for name in names:
> fullname = os.path.join(path, name)
> try:
> mode = os.lstat(fullname).st_mode
> except OSError:
> mode = 0
> if stat.S_ISDIR(mode):
> _rmtree_unsafe(fullname, onerror)
> else:
> try:
> >   os.unlink(fullname)
> E   PermissionError: [WinError 32] The process cannot access 
> the file because it is being used by another process: 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows

2021-04-15 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322524#comment-17322524
 ] 

Ning Kang commented on BEAM-12178:
--

I think the root cause is in windows, shutil.rmtree cannot delete a directory 
with files that are open by other processes.

In StreamingCache, there is a place where "open"/"close" a file is decoupled: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py#L116.

This test is not flaky. The flakiness is the cleanup of the current interactive 
environment after last test execution. Some of the opened file handlers are not 
closed.

I'll do some further investigation.

> ReadCacheTest flakes on Windows
> ---
>
> Key: BEAM-12178
> URL: https://issues.apache.org/jira/browse/BEAM-12178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing, flake
>
> Example failure: 
> https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757
> {code}
>  ReadCacheTest.test_read_cache 
> 
> [gw4] win32 -- Python 3.6.8 
> d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe
> self =  testMethod=test_read_cache>
> def setUp(self):
> > ie.new_env()
> apache_beam\runners\interactive\caching\read_cache_test.py:37: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> apache_beam\runners\interactive\interactive_environment.py:119: in new_env
> _interactive_beam_env.cleanup()
> apache_beam\runners\interactive\interactive_environment.py:269: in cleanup
> self.evict_recording_manager(pipeline)
> apache_beam\runners\interactive\interactive_environment.py:396: in 
> evict_recording_manager
> rm.clear()
> apache_beam\runners\interactive\recording_manager.py:339: in clear
> cache_manager.cleanup()
> apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup
> shutil.rmtree(self._cache_dir)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
> return _rmtree_unsafe(path, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in 
> _rmtree_unsafe
> _rmtree_unsafe(fullname, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in 
> _rmtree_unsafe
> onerror(os.unlink, fullname, sys.exc_info())
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full'
> onerror = .onerror at 0x0244BA770A60>
> def _rmtree_unsafe(path, onerror):
> try:
> if os.path.islink(path):
> # symlinks to directories are forbidden, see bug #1669
> raise OSError("Cannot call rmtree on a symbolic link")
> except OSError:
> onerror(os.path.islink, path, sys.exc_info())
> # can't continue even if onerror hook returns
> return
> names = []
> try:
> names = os.listdir(path)
> except OSError:
> onerror(os.listdir, path, sys.exc_info())
> for name in names:
> fullname = os.path.join(path, name)
> try:
> mode = os.lstat(fullname).st_mode
> except OSError:
> mode = 0
> if stat.S_ISDIR(mode):
> _rmtree_unsafe(fullname, onerror)
> else:
> try:
> >   os.unlink(fullname)
> E   PermissionError: [WinError 32] The process cannot access 
> the file because it is being used by another process: 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-12178) ReadCacheTest flakes on Windows

2021-04-15 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-12178 started by Ning Kang.

> ReadCacheTest flakes on Windows
> ---
>
> Key: BEAM-12178
> URL: https://issues.apache.org/jira/browse/BEAM-12178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing, flake
>
> Example failure: 
> https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757
> {code}
>  ReadCacheTest.test_read_cache 
> 
> [gw4] win32 -- Python 3.6.8 
> d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe
> self =  testMethod=test_read_cache>
> def setUp(self):
> > ie.new_env()
> apache_beam\runners\interactive\caching\read_cache_test.py:37: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> apache_beam\runners\interactive\interactive_environment.py:119: in new_env
> _interactive_beam_env.cleanup()
> apache_beam\runners\interactive\interactive_environment.py:269: in cleanup
> self.evict_recording_manager(pipeline)
> apache_beam\runners\interactive\interactive_environment.py:396: in 
> evict_recording_manager
> rm.clear()
> apache_beam\runners\interactive\recording_manager.py:339: in clear
> cache_manager.cleanup()
> apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup
> shutil.rmtree(self._cache_dir)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
> return _rmtree_unsafe(path, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in 
> _rmtree_unsafe
> _rmtree_unsafe(fullname, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in 
> _rmtree_unsafe
> onerror(os.unlink, fullname, sys.exc_info())
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full'
> onerror = .onerror at 0x0244BA770A60>
> def _rmtree_unsafe(path, onerror):
> try:
> if os.path.islink(path):
> # symlinks to directories are forbidden, see bug #1669
> raise OSError("Cannot call rmtree on a symbolic link")
> except OSError:
> onerror(os.path.islink, path, sys.exc_info())
> # can't continue even if onerror hook returns
> return
> names = []
> try:
> names = os.listdir(path)
> except OSError:
> onerror(os.listdir, path, sys.exc_info())
> for name in names:
> fullname = os.path.join(path, name)
> try:
> mode = os.lstat(fullname).st_mode
> except OSError:
> mode = 0
> if stat.S_ISDIR(mode):
> _rmtree_unsafe(fullname, onerror)
> else:
> try:
> >   os.unlink(fullname)
> E   PermissionError: [WinError 32] The process cannot access 
> the file because it is being used by another process: 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-12178) ReadCacheTest flakes on Windows

2021-04-15 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-12178:


Assignee: Ning Kang

> ReadCacheTest flakes on Windows
> ---
>
> Key: BEAM-12178
> URL: https://issues.apache.org/jira/browse/BEAM-12178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing, flake
>
> Example failure: 
> https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757
> {code}
>  ReadCacheTest.test_read_cache 
> 
> [gw4] win32 -- Python 3.6.8 
> d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe
> self =  testMethod=test_read_cache>
> def setUp(self):
> > ie.new_env()
> apache_beam\runners\interactive\caching\read_cache_test.py:37: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> apache_beam\runners\interactive\interactive_environment.py:119: in new_env
> _interactive_beam_env.cleanup()
> apache_beam\runners\interactive\interactive_environment.py:269: in cleanup
> self.evict_recording_manager(pipeline)
> apache_beam\runners\interactive\interactive_environment.py:396: in 
> evict_recording_manager
> rm.clear()
> apache_beam\runners\interactive\recording_manager.py:339: in clear
> cache_manager.cleanup()
> apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup
> shutil.rmtree(self._cache_dir)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
> return _rmtree_unsafe(path, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in 
> _rmtree_unsafe
> _rmtree_unsafe(fullname, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in 
> _rmtree_unsafe
> onerror(os.unlink, fullname, sys.exc_info())
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full'
> onerror = .onerror at 0x0244BA770A60>
> def _rmtree_unsafe(path, onerror):
> try:
> if os.path.islink(path):
> # symlinks to directories are forbidden, see bug #1669
> raise OSError("Cannot call rmtree on a symbolic link")
> except OSError:
> onerror(os.path.islink, path, sys.exc_info())
> # can't continue even if onerror hook returns
> return
> names = []
> try:
> names = os.listdir(path)
> except OSError:
> onerror(os.listdir, path, sys.exc_info())
> for name in names:
> fullname = os.path.join(path, name)
> try:
> mode = os.lstat(fullname).st_mode
> except OSError:
> mode = 0
> if stat.S_ISDIR(mode):
> _rmtree_unsafe(fullname, onerror)
> else:
> try:
> >   os.unlink(fullname)
> E   PermissionError: [WinError 32] The process cannot access 
> the file because it is being used by another process: 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-12178) ReadCacheTest flakes on Windows

2021-04-15 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322499#comment-17322499
 ] 

Ning Kang commented on BEAM-12178:
--

I think it's something with Python file IO on windows. Some similar issues: 
https://stackoverflow.com/questions/33322932/windowserror-error-32-the-process-cannot-access-the-file-because-it-is-being.

I'll suggest temporarily disable the test from running on windows.
Let me make a change to do so.

> ReadCacheTest flakes on Windows
> ---
>
> Key: BEAM-12178
> URL: https://issues.apache.org/jira/browse/BEAM-12178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Brian Hulette
>Priority: P1
>  Labels: currently-failing, flake
>
> Example failure: 
> https://github.com/apache/beam/pull/14382/checks?check_run_id=2325304757
> {code}
>  ReadCacheTest.test_read_cache 
> 
> [gw4] win32 -- Python 3.6.8 
> d:\a\beam\beam\sdks\python\target\.tox\py36-win\scripts\python.exe
> self =  testMethod=test_read_cache>
> def setUp(self):
> > ie.new_env()
> apache_beam\runners\interactive\caching\read_cache_test.py:37: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> apache_beam\runners\interactive\interactive_environment.py:119: in new_env
> _interactive_beam_env.cleanup()
> apache_beam\runners\interactive\interactive_environment.py:269: in cleanup
> self.evict_recording_manager(pipeline)
> apache_beam\runners\interactive\interactive_environment.py:396: in 
> evict_recording_manager
> rm.clear()
> apache_beam\runners\interactive\recording_manager.py:339: in clear
> cache_manager.cleanup()
> apache_beam\runners\interactive\caching\streaming_cache.py:385: in cleanup
> shutil.rmtree(self._cache_dir)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
> return _rmtree_unsafe(path, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in 
> _rmtree_unsafe
> _rmtree_unsafe(fullname, onerror)
> c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in 
> _rmtree_unsafe
> onerror(os.unlink, fullname, sys.exc_info())
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> path = 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full'
> onerror = .onerror at 0x0244BA770A60>
> def _rmtree_unsafe(path, onerror):
> try:
> if os.path.islink(path):
> # symlinks to directories are forbidden, see bug #1669
> raise OSError("Cannot call rmtree on a symbolic link")
> except OSError:
> onerror(os.path.islink, path, sys.exc_info())
> # can't continue even if onerror hook returns
> return
> names = []
> try:
> names = os.listdir(path)
> except OSError:
> onerror(os.listdir, path, sys.exc_info())
> for name in names:
> fullname = os.path.join(path, name)
> try:
> mode = os.lstat(fullname).st_mode
> except OSError:
> mode = 0
> if stat.S_ISDIR(mode):
> _rmtree_unsafe(fullname, onerror)
> else:
> try:
> >   os.unlink(fullname)
> E   PermissionError: [WinError 32] The process cannot access 
> the file because it is being used by another process: 
> 'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-3ggiapgo2494199486000\\full\\dacc5c76b6-2494199456376-2494201991240-2494199486000'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-12165) ParquetIO sink should allow to pass an Avro data model

2021-04-14 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321209#comment-17321209
 ] 

Ning Kang commented on BEAM-12165:
--

I closed Pull Request #14533 since I don't have much knowledge about the 
package, nor parquet or avro.
I'll leave the issue to better hands to send a proper fix.

> ParquetIO sink should allow to pass an Avro data model
> --
>
> Key: BEAM-12165
> URL: https://issues.apache.org/jira/browse/BEAM-12165
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Reporter: Ning Kang
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> AvroParquetWriter instantiated in ParquetIO [1] does not specify the data 
> model.
> The default is SpecificData model [2], while the AvroParquetReader is reading 
> with a GenericData model [3].
> ParquetIO should pass in the correct data model.
> [1] 
> https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052
> [2] 
> https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163
> [3] 
> https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-12165) ParquetIO sink should allow to pass an Avro data model

2021-04-14 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-12165:


Assignee: (was: Ning Kang)

> ParquetIO sink should allow to pass an Avro data model
> --
>
> Key: BEAM-12165
> URL: https://issues.apache.org/jira/browse/BEAM-12165
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Reporter: Ning Kang
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> AvroParquetWriter instantiated in ParquetIO [1] does not specify the data 
> model.
> The default is SpecificData model [2], while the AvroParquetReader is reading 
> with a GenericData model [3].
> ParquetIO should pass in the correct data model.
> [1] 
> https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052
> [2] 
> https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163
> [3] 
> https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-12165) ParquetIO does not pass in correct Avro data model

2021-04-13 Thread Ning Kang (Jira)
Ning Kang created BEAM-12165:


 Summary: ParquetIO does not pass in correct Avro data model
 Key: BEAM-12165
 URL: https://issues.apache.org/jira/browse/BEAM-12165
 Project: Beam
  Issue Type: Improvement
  Components: io-java-parquet
Reporter: Ning Kang
Assignee: Ning Kang


AvroParquetWriter instantiated in ParquetIO [1] does not specify the data model.

The default is SpecificData model [2], while the AvroParquetReader is reading 
with a GenericData model [3].

ParquetIO should pass in the correct data model.


[1] 
https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1052
[2] 
https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroParquetWriter.java#L163
[3] 
https://github.com/apache/beam/blob/v2.28.0/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L704



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-12096) Flake: test_progress_in_HTML_JS_when_in_notebook

2021-04-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-12096:
-
Description: 
* 
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/

self =  def 
test_progress_in_HTML_JS_when_in_notebook(self): 
ie.current_env()._is_in_notebook = True pi_path = 
'apache_beam.runners.interactive.utils.ProgressIndicator' with 
patch('IPython.core.display.HTML') as mocked_html, \ 
patch('IPython.core.display.Javascript') as mocked_javascript, \ patch(pi_path 
+ '.spinner_template') as enter_template, \ patch(pi_path + 
'.spinner_removal_template') as exit_template: 
enter_template.format.return_value = 'enter' exit_template.format.return_value 
= 'exit' @utils.progress_indicated def progress_indicated_dummy(): 
mocked_html.assert_any_call('enter') > progress_indicated_dummy() 
apache_beam/runners/interactive/utils_test.py:217: _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apache_beam/runners/interactive/utils.py:235: in run_within_progress_indicator 
return func(*args, **kwargs) apache_beam/runners/interactive/utils_test.py:215: 
in progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = 
, args = ('enter',) kwargs = {}, 
expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...]

>You have limited Interactive Beam features since your ipython kernel is not 
>connected any notebook frontend.')] cause = None, expected_string = 
>"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the mock 
>has been called with the specified arguments. The assert passes if the mock 
>has *ever* been called, unlike `assert_called_with` and 
>`assert_called_once_with` that only pass if the call is the most recent 
>one.""" expected = self._call_matcher((args, kwargs)) actual = 
>[self._call_matcher(c) for c in self.call_args_list] if expected not in 
>actual: cause = expected if isinstance(expected, Exception) else None 
>expected_string = self._format_mock_call_signature(args, kwargs) raise 
>AssertionError( '%s call not found' % expected_string > ) from cause E 
>AssertionError: HTML('enter') call not found 
>/usr/lib/python3.7/unittest/mock.py:891: AssertionError

  was:
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/

self =  def 
test_progress_in_HTML_JS_when_in_notebook(self): 
ie.current_env()._is_in_notebook = True pi_path = 
'apache_beam.runners.interactive.utils.ProgressIndicator' with 
patch('IPython.core.display.HTML') as mocked_html, \ 
patch('IPython.core.display.Javascript') as mocked_javascript, \ patch(pi_path 
+ '.spinner_template') as enter_template, \ patch(pi_path + 
'.spinner_removal_template') as exit_template: 
enter_template.format.return_value = 'enter' exit_template.format.return_value 
= 'exit' @utils.progress_indicated def progress_indicated_dummy(): 
mocked_html.assert_any_call('enter') > progress_indicated_dummy() 
apache_beam/runners/interactive/utils_test.py:217: _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
apache_beam/runners/interactive/utils.py:235: in run_within_progress_indicator 
return func(*args, **kwargs) apache_beam/runners/interactive/utils_test.py:215: 
in progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = 
, args = ('enter',) kwargs = {}, 
expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...]

>You have limited Interactive Beam features since your ipython kernel is not 
>connected any notebook frontend.')] cause = None, expected_string = 
>"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the mock 
>has been called with the specified arguments. The assert passes if the mock 
>has *ever* been called, unlike `assert_called_with` and 
>`assert_called_once_with` that only pass if the call is the most recent 
>one.""" expected = self._call_matcher((args, kwargs)) actual = 
>[self._call_matcher(c) for c in self.call_args_list] if expected not in 
>actual: cause = expected if isinstance(expected, Exception) else None 
>expected_string = self._format_mock_call_signature(args, kwargs) raise 
>AssertionError( '%s call not found' % expected_string > ) from cause E 
>AssertionError: HTML('enter') call not found 
>/usr/lib/python3.7/unittest

[jira] [Work started] (BEAM-12096) Flake: test_progress_in_HTML_JS_when_in_notebook

2021-04-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-12096 started by Ning Kang.

> Flake: test_progress_in_HTML_JS_when_in_notebook
> 
>
> Key: BEAM-12096
> URL: https://issues.apache.org/jira/browse/BEAM-12096
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Ning Kang
>Priority: P1
>  Labels: flaky-test
>
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/18008/testReport/junit/apache_beam.runners.interactive.utils_test/ProgressIndicatorTest/test_progress_in_HTML_JS_when_in_notebook/
> self =  testMethod=test_progress_in_HTML_JS_when_in_notebook> def 
> test_progress_in_HTML_JS_when_in_notebook(self): 
> ie.current_env()._is_in_notebook = True pi_path = 
> 'apache_beam.runners.interactive.utils.ProgressIndicator' with 
> patch('IPython.core.display.HTML') as mocked_html, \ 
> patch('IPython.core.display.Javascript') as mocked_javascript, \ 
> patch(pi_path + '.spinner_template') as enter_template, \ patch(pi_path + 
> '.spinner_removal_template') as exit_template: 
> enter_template.format.return_value = 'enter' 
> exit_template.format.return_value = 'exit' @utils.progress_indicated def 
> progress_indicated_dummy(): mocked_html.assert_any_call('enter') > 
> progress_indicated_dummy() apache_beam/runners/interactive/utils_test.py:217: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ apache_beam/runners/interactive/utils.py:235: in 
> run_within_progress_indicator return func(*args, **kwargs) 
> apache_beam/runners/interactive/utils_test.py:215: in 
> progress_indicated_dummy mocked_html.assert_any_call('enter') _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = 
> , args = ('enter',) kwargs = {}, 
> expected = (('enter',), {}) actual = [call('\n https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...";|https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min...]
> >You have limited Interactive Beam features since your ipython kernel is not 
> >connected any notebook frontend.')] cause = None, expected_string = 
> >"HTML('enter')" def assert_any_call(self, *args, **kwargs): """assert the 
> >mock has been called with the specified arguments. The assert passes if the 
> >mock has *ever* been called, unlike `assert_called_with` and 
> >`assert_called_once_with` that only pass if the call is the most recent 
> >one.""" expected = self._call_matcher((args, kwargs)) actual = 
> >[self._call_matcher(c) for c in self.call_args_list] if expected not in 
> >actual: cause = expected if isinstance(expected, Exception) else None 
> >expected_string = self._format_mock_call_signature(args, kwargs) raise 
> >AssertionError( '%s call not found' % expected_string > ) from cause E 
> >AssertionError: HTML('enter') call not found 
> >/usr/lib/python3.7/unittest/mock.py:891: AssertionError



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10514) Make sure Interactive Beam cache file path length does not exceed OS limits

2021-04-05 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314971#comment-17314971
 ] 

Ning Kang commented on BEAM-10514:
--

This should have been fixed.

> Make sure Interactive Beam cache file path length does not exceed OS limits
> ---
>
> Key: BEAM-10514
> URL: https://issues.apache.org/jira/browse/BEAM-10514
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Priority: P3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The path length limit in Linux is 4096.
> The limit in windows API is 260.
> If long file name support is enabled in windows, the limit will be 32767.
> An example path that doesn't work on windows is 
> c:\windows\temp\interactive-temp-xqj_fv471021776\cache-20-07-16-08_28_57\full\beam-temp-anonymous_pcollection_433934912-433934912-433938272-471021776-65e3a91ec73e11ea91b7e69191fc0bf0\284e0d91-ca8a-463c-a894-bde70cdec599.anonymous_pcollection_433934912-433934912-433938272-471021776
>  (length: 281).
> Consider using obfuscation when mapping in-memory PCollections into files.
> If long file name is needed, consider storing a reverse indexed mapping in 
> memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10514) Make sure Interactive Beam cache file path length does not exceed OS limits

2021-04-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10514:
-
Resolution: Fixed
Status: Resolved  (was: Open)

> Make sure Interactive Beam cache file path length does not exceed OS limits
> ---
>
> Key: BEAM-10514
> URL: https://issues.apache.org/jira/browse/BEAM-10514
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Priority: P3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The path length limit in Linux is 4096.
> The limit in windows API is 260.
> If long file name support is enabled in windows, the limit will be 32767.
> An example path that doesn't work on windows is 
> c:\windows\temp\interactive-temp-xqj_fv471021776\cache-20-07-16-08_28_57\full\beam-temp-anonymous_pcollection_433934912-433934912-433938272-471021776-65e3a91ec73e11ea91b7e69191fc0bf0\284e0d91-ca8a-463c-a894-bde70cdec599.anonymous_pcollection_433934912-433934912-433938272-471021776
>  (length: 281).
> Consider using obfuscation when mapping in-memory PCollections into files.
> If long file name is needed, consider storing a reverse indexed mapping in 
> memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9630) Python SDK streaming from Pubsub getting error `grpc.StatusRuntimeException: CANCELLED: call already cancelled`

2021-04-02 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314095#comment-17314095
 ] 

Ning Kang commented on BEAM-9630:
-

Is this something a user can ignore? A similar question is asked 
[here](https://stackoverflow.com/questions/66893912/apache-beam-statusruntimeexception-on-dataflow-pipeline).

> Python SDK streaming from Pubsub getting error `grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled`
> ---
>
> Key: BEAM-9630
> URL: https://issues.apache.org/jira/browse/BEAM-9630
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.19.0
> Environment: python==3.7.5
> apache-beam[gcp]==2.19.0
> google-cloud-pubsub==1.4.2
>Reporter: Kan Dong
>Priority: P2
> Fix For: Not applicable
>
>
> I have a dataflow streaming job using Apache Beam Python 3.7 SDK 2.19.0.  The 
> job consumes pubsub messages, treat data and publish to pubsub as output.  
> Periodically, I would get the below error messages and the worker would stop 
> consuming messages.
> ```Error message from worker: java.util.concurrent.ExecutionException: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: cancelled before receiving half close 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) 
> org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:332)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1350)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1100(StreamingDataflowWorker.java:152)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$7.run(StreamingDataflowWorker.java:1073)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: cancelled before receiving half close 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:273)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:337)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:793)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748) 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
>  
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.finish(RemoteGrpcPortWriteOperation.java:21

[jira] [Commented] (BEAM-11797) Flaky interactive test in Precommit

2021-03-30 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311758#comment-17311758
 ] 

Ning Kang commented on BEAM-11797:
--

Find the root cause: assert_called_with does not mean `asserting has been 
called with` but `asserting the last call is`. And there is an 
IPythonLogHandler that emits logs from time to time that is not so 
deterministic when running on a Jenkins environment. The handler uses the same 
display module that is used by the decorator in test, causing the flakiness. 
Changed `assert_called_with` with `assert_any_call` to fix it.

> Flaky interactive test in Precommit
> ---
>
> Key: BEAM-11797
> URL: https://issues.apache.org/jira/browse/BEAM-11797
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-11797) Flaky interactive test in Precommit

2021-03-12 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300537#comment-17300537
 ] 

Ning Kang commented on BEAM-11797:
--

Working on it now.

> Flaky interactive test in Precommit
> ---
>
> Key: BEAM-11797
> URL: https://issues.apache.org/jira/browse/BEAM-11797
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Ning Kang
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8288) Cleanup Interactive Beam Python 2 support

2021-03-12 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-8288:

Resolution: Fixed
Status: Resolved  (was: Open)

> Cleanup Interactive Beam Python 2 support
> -
>
> Key: BEAM-8288
> URL: https://issues.apache.org/jira/browse/BEAM-8288
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Priority: P3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As Beam is retiring Python 2, some special handle in Interactive Beam code 
> and tests will need to be cleaned up.
> This Jira ticket tracks those changes to be cleaned up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8288) Cleanup Interactive Beam Python 2 support

2021-03-12 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300536#comment-17300536
 ] 

Ning Kang commented on BEAM-8288:
-

[~yoshiki.obata] Thanks, closing it.

> Cleanup Interactive Beam Python 2 support
> -
>
> Key: BEAM-8288
> URL: https://issues.apache.org/jira/browse/BEAM-8288
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Priority: P3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As Beam is retiring Python 2, some special handle in Interactive Beam code 
> and tests will need to be cleaned up.
> This Jira ticket tracks those changes to be cleaned up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11906) No trigger early repeatedly for session windows

2021-03-01 Thread Ning Kang (Jira)
Ning Kang created BEAM-11906:


 Summary: No trigger early repeatedly for session windows
 Key: BEAM-11906
 URL: https://issues.apache.org/jira/browse/BEAM-11906
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Affects Versions: 2.28.0, 2.23.0
Reporter: Ning Kang


Originated from: 
https://stackoverflow.com/questions/66381608/apache-beam-does-not-trigger-early-repeatedly-for-session-windows-on-google-data

The following pipeline fires early after each element when running locally 
using DirectRunner, but there are no early triggers when running on google 
cloud dataflow. On dataflow it triggers only after the session window has 
closed.

{code:python}
( p
| 'read'   >> beam.io.ReadFromPubSub(subscription = 
'projects/xxx/subscriptions/xxx-sub')
| 'json'   >> beam.Map(lambda x: json.loads(x.decode('utf-8')))
| 'kv' >> beam.Map(lambda x: (x['id'], x['amount']))
| 'window' >> beam.WindowInto(window.Sessions(15*60), 
trigger=trigger.Repeatedly(trigger.AfterCount(1)), 
accumulation_mode=AccumulationMode.ACCUMULATING)
| 'group'  >> beam.GroupByKey()
| 'log'>> beam.Map(lambda x: logging.info(x))
)
{code}

Apache Beam versions tried: 2.23 and 2.28.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-11666) apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution is flaky

2021-02-19 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-11666:


Assignee: Sam Rohde  (was: Ning Kang)

> apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
>  is flaky
> -
>
> Key: BEAM-11666
> URL: https://issues.apache.org/jira/browse/BEAM-11666
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Sam Rohde
>Priority: P1
>  Labels: stale-assigned
>
> Happened in: https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/16819
> {noformat}
> self = 
>  testMethod=test_basic_execution>
> @unittest.skipIf(
> sys.version_info < (3, 6, 0),
> 'This test requires at least Python 3.6 to work.')
> def test_basic_execution(self):
>   """A basic pipeline to be used as a smoke test."""
> 
>   # Create the pipeline that will emit 0, 1, 2.
>   p = beam.Pipeline(InteractiveRunner())
>   numbers = p | 'numbers' >> beam.Create([0, 1, 2])
>   letters = p | 'letters' >> beam.Create(['a', 'b', 'c'])
> 
>   # Watch the pipeline and PCollections. This is normally done in a 
> notebook
>   # environment automatically, but we have to do it manually here.
>   ib.watch(locals())
>   ie.current_env().track_user_pipelines()
> 
>   # Create the recording objects. By calling `record` a new 
> PipelineFragment
>   # is started to compute the given PCollections and cache to disk.
>   rm = RecordingManager(p)
> > numbers_recording = rm.record([numbers], max_n=3, max_duration=500)
> apache_beam/runners/interactive/recording_manager_test.py:331: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/runners/interactive/recording_manager.py:435: in record
> self._clear(pipeline_instrument)
> apache_beam/runners/interactive/recording_manager.py:319: in _clear
> self._clear_pcolls(cache_manager, set(to_clear))
> apache_beam/runners/interactive/recording_manager.py:323: in _clear_pcolls
> cache_manager.clear('full', pc)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
>  object at 0x7fa3903ac208>
> labels = ('full', 
> 'ee5c35ce3d-140340882711664-140340882712560-140340476166608')
> def clear(self, *labels):
>   # type (*str) -> Boolean
> 
>   """Clears the cache entry of the given labels and returns True on 
> success.
> 
>   Args:
> value: An encodable (with corresponding PCoder) value
> *labels: List of labels for PCollection instance
>   """
> > raise NotImplementedError
> E NotImplementedError
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11588) beam_PreCommit_PythonDocs_Cron failing

2021-02-09 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11588:
-
Resolution: Fixed
Status: Resolved  (was: Open)

Mark the issue as fixed.

> beam_PreCommit_PythonDocs_Cron failing
> --
>
> Key: BEAM-11588
> URL: https://issues.apache.org/jira/browse/BEAM-11588
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Ning Kang
>Priority: P0
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Error: 04:13:26 jupyter-client 6.1.10 has requirement jedi<=0.17.2, but you 
> have jedi 0.18.0.
> Example Log: 
> https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/584/console
> It seems like this happened due to a new version of jupyter-client released 
> and it is not compatible with jedi 0.18.0 (more here: 
> https://github.com/jupyter/jupyter_client/issues/597)
> Attempting to put an upper limit on jupyter-client did not work 
> (https://github.com/apache/beam/pull/13709) Because ipykernel installs latest 
> version of the jupyter-client first. Current dependency tree looks like 
> (omitting unrelated parts):
> ipykernel==5.4.2
>   - ipython [required: >=5.0.0, installed: 7.19.0]
> - jedi [required: >=0.10, installed: 0.17.0]
>   - jupyter-client [required: Any, installed: 6.1.10]
> - jedi [required: <=0.17.2, installed: 0.17.0]
> Potential solutions:
> - Force install jedi <= 0.17.2
> - Force install jupyter-client <= 6.1.7
> Alos, we can probably remove jupyter-client as an explicit dependency, since 
> ipykernel already depends on it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-11476) flaky test: test_track_user_pipeline_cleanup_non_inspectable_pipeline

2021-02-02 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277528#comment-17277528
 ] 

Ning Kang commented on BEAM-11476:
--

Thanks Brian, the flaky (failed) test is here: 
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/17094/testReport/junit/apache_beam.runners.interactive.interactive_environment_test/InteractiveEnvironmentTest/test_track_user_pipeline_cleanup_non_inspectable_pipeline_2/

The `Standard Errors` are warning logs. I'll see if I can disable the log 
handler in tests.

> flaky test: test_track_user_pipeline_cleanup_non_inspectable_pipeline
> -
>
> Key: BEAM-11476
> URL: https://issues.apache.org/jira/browse/BEAM-11476
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing, flake
>
> Test name: 
> apache_beam.runners.interactive.interactive_environment_test.InteractiveEnvironmentTest.test_track_user_pipeline_cleanup_non_inspectable_pipeline
> {code}
> Error Message
> AssertionError: Expected 'cleanup' to not have been called. Called 1 times.
> Stacktrace
> self = 
>   testMethod=test_track_user_pipeline_cleanup_non_inspectable_pipeline>
> mocked_cleanup = 
> @patch(
> 'apache_beam.runners.interactive.interactive_environment'
> '.InteractiveEnvironment.cleanup')
> def test_track_user_pipeline_cleanup_non_inspectable_pipeline(
> self, mocked_cleanup):
>   ie._interactive_beam_env = None
>   ie.new_env()
>   dummy_pipeline_1 = beam.Pipeline()
>   dummy_pipeline_2 = beam.Pipeline()
>   dummy_pipeline_3 = beam.Pipeline()
>   dummy_pipeline_4 = beam.Pipeline()
>   dummy_pcoll = dummy_pipeline_4 | beam.Create([1])
>   dummy_pipeline_5 = beam.Pipeline()
>   dummy_non_inspectable_pipeline = 'dummy'
>   ie.current_env().watch(locals())
>   from apache_beam.runners.interactive.background_caching_job import 
> BackgroundCachingJob
>   ie.current_env().set_background_caching_job(
>   dummy_pipeline_1,
>   BackgroundCachingJob(
>   runner.PipelineResult(runner.PipelineState.DONE), limiters=[]))
>   ie.current_env().set_test_stream_service_controller(dummy_pipeline_2, 
> None)
>   ie.current_env().set_cache_manager(
>   cache.FileBasedCacheManager(), dummy_pipeline_3)
>   ie.current_env().mark_pcollection_computed([dummy_pcoll])
>   ie.current_env().set_cached_source_signature(
>   dummy_non_inspectable_pipeline, None)
>   ie.current_env().set_pipeline_result(
>   dummy_pipeline_5, 
> runner.PipelineResult(runner.PipelineState.RUNNING))
> > mocked_cleanup.assert_not_called()
> apache_beam/runners/interactive/interactive_environment_test.py:265: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> _mock_self = 
> def assert_not_called(_mock_self):
> """assert that the mock was never called.
> """
> self = _mock_self
> if self.call_count != 0:
> msg = ("Expected '%s' to not have been called. Called %s times." %
>(self._mock_name or 'mock', self.call_count))
> >   raise AssertionError(msg)
> E   AssertionError: Expected 'cleanup' to not have been called. 
> Called 1 times.
> /usr/lib/python3.7/unittest/mock.py:792: AssertionError
> {code}
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/3609/
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/3607/
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/3593/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue

2021-01-27 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11705:
-
Description: 
The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is True: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.

Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any 
effect when `ignore_insert_id` is True in the implementation because it skipped 
the `ReshufflePerKey` 
(https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1422).
 When `ignore_insert_id` is True, we seem to lost the batch size control?

  was:
The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is True: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.

Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any 
effect when `ignore_insert_id` is True in the implementation because it skipped 
the `ReshufflePerKey`. When `ignore_insert_id` is True, we seem to lost the 
batch size control?


> Write to bigquery always assigns unique insert id per row causing performance 
> issue
> ---
>
> Key: BEAM-11705
> URL: https://issues.apache.org/jira/browse/BEAM-11705
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: P2
>
> The `ignore_insert_id` argument in BigQuery IO Connector
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
>  does not take effect.
> Because the implementation of sending insert rows request always uses an auto 
> generated uuid even when the insert_ids is set to None when 
> `ignore_insert_id` is True: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062
> The implementation should explicitly set insert_id as None instead of using a 
> generated uuid, an example: 
> https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33
> An unique insert id per row would make the streaming inserts very slow.
> Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any 
> effect when `ignore_insert_id` is True in the implementation because it 
> skipped the `ReshufflePerKey` 
> (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1422).
>  When `ignore_insert_id` is True, we seem to lost the batch size control?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue

2021-01-27 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11705:
-
Description: 
The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is True: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.

Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any 
effect when `ignore_insert_id` is True in the implementation because it skipped 
the `ReshufflePerKey`. When `ignore_insert_id` is True, we seem to lost the 
batch size control?

  was:
The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is True: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.


> Write to bigquery always assigns unique insert id per row causing performance 
> issue
> ---
>
> Key: BEAM-11705
> URL: https://issues.apache.org/jira/browse/BEAM-11705
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: P2
>
> The `ignore_insert_id` argument in BigQuery IO Connector
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
>  does not take effect.
> Because the implementation of sending insert rows request always uses an auto 
> generated uuid even when the insert_ids is set to None when 
> `ignore_insert_id` is True: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062
> The implementation should explicitly set insert_id as None instead of using a 
> generated uuid, an example: 
> https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33
> An unique insert id per row would make the streaming inserts very slow.
> Additionally, the `DEFAULT_SHARDS_PER_DESTINATION` doesn't seem to take any 
> effect when `ignore_insert_id` is True in the implementation because it 
> skipped the `ReshufflePerKey`. When `ignore_insert_id` is True, we seem to 
> lost the batch size control?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue

2021-01-27 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11705:
-
Description: 
The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is True: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.

  was:
The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is False: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.


> Write to bigquery always assigns unique insert id per row causing performance 
> issue
> ---
>
> Key: BEAM-11705
> URL: https://issues.apache.org/jira/browse/BEAM-11705
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Ning Kang
>Assignee: Pablo Estrada
>Priority: P2
>
> The `ignore_insert_id` argument in BigQuery IO Connector
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
>  does not take effect.
> Because the implementation of sending insert rows request always uses an auto 
> generated uuid even when the insert_ids is set to None when 
> `ignore_insert_id` is True: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062
> The implementation should explicitly set insert_id as None instead of using a 
> generated uuid, an example: 
> https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33
> An unique insert id per row would make the streaming inserts very slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11705) Write to bigquery always assigns unique insert id per row causing performance issue

2021-01-27 Thread Ning Kang (Jira)
Ning Kang created BEAM-11705:


 Summary: Write to bigquery always assigns unique insert id per row 
causing performance issue
 Key: BEAM-11705
 URL: https://issues.apache.org/jira/browse/BEAM-11705
 Project: Beam
  Issue Type: Improvement
  Components: io-py-gcp
Reporter: Ning Kang


The `ignore_insert_id` argument in BigQuery IO Connector
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1471
 does not take effect.

Because the implementation of sending insert rows request always uses an auto 
generated uuid even when the insert_ids is set to None when `ignore_insert_id` 
is False: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1062

The implementation should explicitly set insert_id as None instead of using a 
generated uuid, an example: 
https://github.com/googleapis/python-bigquery/blob/master/samples/table_insert_rows_explicit_none_insert_ids.py#L33

An unique insert id per row would make the streaming inserts very slow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11692) Support DataflowRunner as an underlying runner of InteractiveRunner

2021-01-26 Thread Ning Kang (Jira)
Ning Kang created BEAM-11692:


 Summary: Support DataflowRunner as an underlying runner of 
InteractiveRunner
 Key: BEAM-11692
 URL: https://issues.apache.org/jira/browse/BEAM-11692
 Project: Beam
  Issue Type: Improvement
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


When using DataflowRunner as an underlying runner and GCS buckets as cache 
dirs, the `ib.show()` and `ib.collect()` misbehaves because the cache manager 
does not wait for cache file to be written while DataflowRunner running on 
Dataflow service takes a long time to complete the write tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-01-20 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268918#comment-17268918
 ] 

Ning Kang commented on BEAM-10708:
--

Thanks for the comments, Cham and Brian! If `from_runner_api` is not a possible 
solution to copy pipelines, the InteractiveRunner has to either copy the 
pipeline objects explicitly or make changes directly on the proto before each 
iteration of execution.

> InteractiveRunner cannot execute pipeline with cross-language transform
> ---
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P2
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-01-12 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263715#comment-17263715
 ] 

Ning Kang edited comment on BEAM-10708 at 1/13/21, 2:30 AM:


As of today (2021-01-12), when a pipeline including a SqlTransform is executed 
with InteractiveRunner() or invoked in `from_runner_api(to_runner_api())`, an 
example failure stack trace can be found as following:

{code:python}
---
ValueErrorTraceback (most recent call last)
 in 
> 1 ib.show(pcoll)

~/beam/sdks/python/apache_beam/runners/interactive/utils.py in 
run_within_progress_indicator(*args, **kwargs)
226   def run_within_progress_indicator(*args, **kwargs):
227 with ProgressIndicator('Processing...', 'Done.'):
--> 228   return func(*args, **kwargs)
229 
230   return run_within_progress_indicator

~/beam/sdks/python/apache_beam/runners/interactive/interactive_beam.py in 
show(*pcolls, **configs)
484   recording_manager = ie.current_env().get_recording_manager(
485   user_pipeline, create_if_absent=True)
--> 486   recording = recording_manager.record(pcolls, max_n=n, 
max_duration=duration)
487 
488   # Catch a KeyboardInterrupt to gracefully cancel the recording and

~/beam/sdks/python/apache_beam/runners/interactive/recording_manager.py in 
record(self, pcolls, max_n, max_duration)
420 # arbitrary variables.
421 self._watch(pcolls)
--> 422 pipeline_instrument = pi.PipelineInstrument(self.user_pipeline)
423 self.record_pipeline()
424 

~/beam/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py in 
__init__(self, pipeline, options)
113 # proto is stable. The snapshot of pipeline will not be mutated 
within this
114 # module and can be used to recover original pipeline if needed.
--> 115 self._pipeline_snap = beam.pipeline.Pipeline.from_runner_api(
116 pipeline.to_runner_api(use_fake_coders=True), pipeline.runner, 
options)
117 ie.current_env().add_derived_pipeline(self._pipeline, 
self._pipeline_snap)

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, runner, 
options, return_context)
900 if proto.root_transform_ids:
901   root_transform_id, = proto.root_transform_ids
--> 902   p.transforms_stack = 
[context.transforms.get_by_id(root_transform_id)]
903 else:
904   p.transforms_stack = [AppliedPTransform(None, None, '', None)]

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-> 1252   part = context.transforms.get_by_id(transform_id)
   1253   part.parent = result
   1254   result.parts.append(part)

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-> 1252   part = context.transforms.get_by_id(transform_id)
   1253   part.parent = result
   1254   result.parts.append(part)

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-> 1252   part = context.transforms.get_by_id(transform_id)
   1253   part.parent = result
   1254   result.parts.append(part)

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(

[jira] [Commented] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-01-12 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263715#comment-17263715
 ] 

Ning Kang commented on BEAM-10708:
--

As of today (2021-01-12), when a pipeline including a SqlTransform is executed 
with InteractiveRunner(), an example failure stack trace can be found as 
following:

{code:python}
---
ValueErrorTraceback (most recent call last)
 in 
> 1 ib.show(pcoll)

~/beam/sdks/python/apache_beam/runners/interactive/utils.py in 
run_within_progress_indicator(*args, **kwargs)
226   def run_within_progress_indicator(*args, **kwargs):
227 with ProgressIndicator('Processing...', 'Done.'):
--> 228   return func(*args, **kwargs)
229 
230   return run_within_progress_indicator

~/beam/sdks/python/apache_beam/runners/interactive/interactive_beam.py in 
show(*pcolls, **configs)
484   recording_manager = ie.current_env().get_recording_manager(
485   user_pipeline, create_if_absent=True)
--> 486   recording = recording_manager.record(pcolls, max_n=n, 
max_duration=duration)
487 
488   # Catch a KeyboardInterrupt to gracefully cancel the recording and

~/beam/sdks/python/apache_beam/runners/interactive/recording_manager.py in 
record(self, pcolls, max_n, max_duration)
420 # arbitrary variables.
421 self._watch(pcolls)
--> 422 pipeline_instrument = pi.PipelineInstrument(self.user_pipeline)
423 self.record_pipeline()
424 

~/beam/sdks/python/apache_beam/runners/interactive/pipeline_instrument.py in 
__init__(self, pipeline, options)
113 # proto is stable. The snapshot of pipeline will not be mutated 
within this
114 # module and can be used to recover original pipeline if needed.
--> 115 self._pipeline_snap = beam.pipeline.Pipeline.from_runner_api(
116 pipeline.to_runner_api(use_fake_coders=True), pipeline.runner, 
options)
117 ie.current_env().add_derived_pipeline(self._pipeline, 
self._pipeline_snap)

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, runner, 
options, return_context)
900 if proto.root_transform_ids:
901   root_transform_id, = proto.root_transform_ids
--> 902   p.transforms_stack = 
[context.transforms.get_by_id(root_transform_id)]
903 else:
904   p.transforms_stack = [AppliedPTransform(None, None, '', None)]

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-> 1252   part = context.transforms.get_by_id(transform_id)
   1253   part.parent = result
   1254   result.parts.append(part)

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-> 1252   part = context.transforms.get_by_id(transform_id)
   1253   part.parent = result
   1254   result.parts.append(part)

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-> 1252   part = context.transforms.get_by_id(transform_id)
   1253   part.parent = result
   1254   result.parts.append(part)

~/beam/sdks/python/apache_beam/runners/pipeline_context.py in get_by_id(self, 
id)
113 # type: (str) -> PortableObjectT
114 if id not in self._id_to_obj:
--> 115   self._id_to_obj[id] = self._obj_type.from_runner_api(
116   self._id_to_proto[id], self._pipeline_context)
117 return self._id_to_obj[id]

~/beam/sdks/python/apache_beam/pipeline.py in from_runner_api(proto, context)
   1250 result.parts = []
   1251 for transform_id in proto.subtransforms:
-

[jira] [Updated] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-01-12 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10708:
-
Priority: P2  (was: P3)

> InteractiveRunner cannot execute pipeline with cross-language transform
> ---
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P2
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10708) InteractiveRunner cannot execute pipeline with cross-language transform

2021-01-12 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-10708:


Assignee: Ning Kang

> InteractiveRunner cannot execute pipeline with cross-language transform
> ---
>
> Key: BEAM-10708
> URL: https://issues.apache.org/jira/browse/BEAM-10708
> Project: Beam
>  Issue Type: Bug
>  Components: cross-language
>Reporter: Brian Hulette
>Assignee: Ning Kang
>Priority: P3
>
> The InteractiveRunner crashes when given a pipeline that includes a 
> cross-language transform.
> Here's the example I tried to run in a jupyter notebook:
> {code:python}
> p = beam.Pipeline(InteractiveRunner())
> pc = (p | SqlTransform("""SELECT
> CAST(1 AS INT) AS `id`,
> CAST('foo' AS VARCHAR) AS `str`,
> CAST(3.14  AS DOUBLE) AS `flt`"""))
> df = interactive_beam.collect(pc)
> {code}
> The problem occurs when 
> [pipeline_fragment.py|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L66]
>  creates a copy of the pipeline by [writing it to proto and reading it 
> back|https://github.com/apache/beam/blob/dce1eb83b8d5137c56ac58568820c24bd8fda526/sdks/python/apache_beam/runners/interactive/pipeline_fragment.py#L120].
>  Reading it back fails because some of the pipeline is not written in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'

2020-11-30 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-11362 started by Ning Kang.

> retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) 
> AttributeError: 'NoneType' object has no attribute 'visit'
> 
>
> Key: BEAM-11362
> URL: https://issues.apache.org/jira/browse/BEAM-11362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> Traceback (most recent call last):
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py",
>  line 614, in testProduceOutputVisualization
> _ = pipeline.run()
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py",
>  line 553, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py",
>  line 136, in run_pipeline
> inst.watch_sources(pipeline)
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py",
>  line 1008, in watch_sources
> retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor())
> AttributeError: 'NoneType' object has no attribute 'visit'
> Probably related to this change: https://github.com/apache/beam/pull/13335



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'

2020-11-30 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241133#comment-17241133
 ] 

Ning Kang commented on BEAM-11362:
--

This is intended behavior with change of 
https://github.com/apache/beam/pull/13335.
As an InteractiveRunner user, when defining a pipeline outside the main scope 
(such as within a function), they have to use `ib.watch({'pipeline_name': 
pipeline_instance})` or `ib.watch(locals())` to track the pipeline in the 
interactive environment.
Otherwise, they will run into the issue.

This does not impact Beam notebook users, because they either define pipelines 
in the main scope (directly in notebook cells) or pass around the 
pipeline/pcollection objects and use them in `ib.show()`, `ib.show_graph()` or 
`ib.collect()` (the APIs track the pcollections/pipelines automatically).

This only affects users using InteractiveRunner for the purpose of 
materializing PCollections such as in tests where they don't really care about 
the pipeline object.

The fix should be adding the explicit `watch` statement in the same scope after 
defining the pipeline.

> retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) 
> AttributeError: 'NoneType' object has no attribute 'visit'
> 
>
> Key: BEAM-11362
> URL: https://issues.apache.org/jira/browse/BEAM-11362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> Traceback (most recent call last):
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py",
>  line 614, in testProduceOutputVisualization
> _ = pipeline.run()
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py",
>  line 553, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py",
>  line 136, in run_pipeline
> inst.watch_sources(pipeline)
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py",
>  line 1008, in watch_sources
> retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor())
> AttributeError: 'NoneType' object has no attribute 'visit'
> Probably related to this change: https://github.com/apache/beam/pull/13335



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'

2020-11-30 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11362:
-
Status: Resolved  (was: In Progress)

> retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) 
> AttributeError: 'NoneType' object has no attribute 'visit'
> 
>
> Key: BEAM-11362
> URL: https://issues.apache.org/jira/browse/BEAM-11362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> Traceback (most recent call last):
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py",
>  line 614, in testProduceOutputVisualization
> _ = pipeline.run()
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py",
>  line 553, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py",
>  line 136, in run_pipeline
> inst.watch_sources(pipeline)
>   File 
> "/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py",
>  line 1008, in watch_sources
> retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor())
> AttributeError: 'NoneType' object has no attribute 'visit'
> Probably related to this change: https://github.com/apache/beam/pull/13335



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11362) retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) AttributeError: 'NoneType' object has no attribute 'visit'

2020-11-30 Thread Ning Kang (Jira)
Ning Kang created BEAM-11362:


 Summary: 
retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor()) 
AttributeError: 'NoneType' object has no attribute 'visit'
 Key: BEAM-11362
 URL: https://issues.apache.org/jira/browse/BEAM-11362
 Project: Beam
  Issue Type: Improvement
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


Traceback (most recent call last):
  File 
"/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/photos/vision/features/delf/extract/global_descriptor/python/revisited_datasets/beam/eval_utils_test.py",
 line 614, in testProduceOutputVisualization
_ = pipeline.run()
  File 
"/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/pipeline.py",
 line 553, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/interactive_runner.py",
 line 136, in run_pipeline
inst.watch_sources(pipeline)
  File 
"/build/work/f1a2b3ea7c34e8e49f1a90317e5acde7889a/google3/runfiles/google3/third_party/py/apache_beam/runners/interactive/pipeline_instrument.py",
 line 1008, in watch_sources
retrieved_user_pipeline.visit(CacheableUnboundedPCollectionVisitor())
AttributeError: 'NoneType' object has no attribute 'visit'

Probably related to this change: https://github.com/apache/beam/pull/13335



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11339) Cache clean up might fail on Windows

2020-11-24 Thread Ning Kang (Jira)
Ning Kang created BEAM-11339:


 Summary: Cache clean up might fail on Windows
 Key: BEAM-11339
 URL: https://issues.apache.org/jira/browse/BEAM-11339
 Project: Beam
  Issue Type: Improvement
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


Details can be found in this PR:
https://github.com/apache/beam/pull/12779

{code:java}
apache_beam\runners\interactive\recording_manager_test.py:75: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
apache_beam\runners\interactive\interactive_environment.py:118: in new_env
_interactive_beam_env.cleanup()
apache_beam\runners\interactive\interactive_environment.py:272: in cleanup
cache_manager.cleanup()
apache_beam\runners\interactive\caching\streaming_cache.py:391: in cleanup
shutil.rmtree(self._cache_dir)
c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:500: in rmtree
return _rmtree_unsafe(path, onerror)
c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:390: in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
c:\hostedtoolcache\windows\python\3.6.8\x64\lib\shutil.py:395: in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

...

>   os.unlink(fullname)
E   PermissionError: [WinError 32] The process cannot access 
the file because it is being used by another process: 
'D:\\a\\beam\\beam\\sdks\\python\\target\\.tox\\py36-win\\tmp\\it-4m1c1oje2145793178144\\full\\fb91a47796-2145832985040-2145832986608-2145793178144'
{code}


This does not happen on Linux-like systems:
https://user-images.githubusercontent.com/4423149/100034273-b343ef80-2db0-11eb-8988-dcfc2c79322c.png

A potential solution is to enable ignore_errors and log about manually cleaning 
up onerror.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10970) apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is flaky on Windows

2020-10-21 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-10970:


Assignee: Sam Rohde  (was: Ning Kang)

> apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is 
> flaky on Windows
> 
>
> Key: BEAM-10970
> URL: https://issues.apache.org/jira/browse/BEAM-10970
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robin Qiu
>Assignee: Sam Rohde
>Priority: P1
>  Labels: currently-failing
>
> There are two test suites that are failing now:
>  
> Python Unit Tests (macos-latest, 3.7, py37): 
> [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443154]
>  
> Python Unit Tests (windows-latest, 3.7, py37): 
> [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443198]
>  
> I am not very familiar with the Python tests suites in GitHub check, but I 
> think this failure should be a release blocker?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10970) apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is flaky on Windows

2020-10-21 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218681#comment-17218681
 ] 

Ning Kang commented on BEAM-10970:
--

Hi [~rohdesam] , I think you've mentioned you were working on this issue. 
Please feel free to mark this ticket as duplicated to the one you are working 
on. Thanks!

> apache_beam.runners.interactive.recording_manager_test.ElementStreamTest is 
> flaky on Windows
> 
>
> Key: BEAM-10970
> URL: https://issues.apache.org/jira/browse/BEAM-10970
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robin Qiu
>Assignee: Ning Kang
>Priority: P1
>  Labels: currently-failing
>
> There are two test suites that are failing now:
>  
> Python Unit Tests (macos-latest, 3.7, py37): 
> [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443154]
>  
> Python Unit Tests (windows-latest, 3.7, py37): 
> [https://github.com/apache/beam/pull/12935/checks?check_run_id=1163443198]
>  
> I am not very familiar with the Python tests suites in GitHub check, but I 
> think this failure should be a release blocker?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11045) Screen diff integration tests fail

2020-10-13 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11045:
-
Status: Resolved  (was: In Progress)

> Screen diff integration tests fail
> --
>
> Key: BEAM-11045
> URL: https://issues.apache.org/jira/browse/BEAM-11045
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The tests still function with Python 3.7.
> Golden screenshots and chromedriver-binary need to be updated because the 
> version advancement of newest chrome browser.
> The tests start failing with Python 3.8 due to the incompatibility in 
> `Process` from `multiprocessing` module.
> A process is used to start an HTTP server to serve notebook results in 
> headless browser for taking screenshots. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-11045) Screen diff integration tests fail

2020-10-13 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-11045 started by Ning Kang.

> Screen diff integration tests fail
> --
>
> Key: BEAM-11045
> URL: https://issues.apache.org/jira/browse/BEAM-11045
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The tests still function with Python 3.7.
> Golden screenshots and chromedriver-binary need to be updated because the 
> version advancement of newest chrome browser.
> The tests start failing with Python 3.8 due to the incompatibility in 
> `Process` from `multiprocessing` module.
> A process is used to start an HTTP server to serve notebook results in 
> headless browser for taking screenshots. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11056) Fix warning message and rename old APIs

2020-10-13 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11056:
-
Status: Resolved  (was: In Progress)

> Fix warning message and rename old APIs
> ---
>
> Key: BEAM-11056
> URL: https://issues.apache.org/jira/browse/BEAM-11056
> Project: Beam
>  Issue Type: Bug
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When invoking `ib.evict_captured_data()`, the logging contains a typo 
> `recordeddata`:
> `You have requested Interactive Beam to evict all recordeddata that could be 
> deterministically replayed among multiple pipeline runs.`
> Also, the `capture_control` should be renamed to `record_control`. All 
> occurrences of `capture` should be changed to `record` to keep the 
> consistency of large source recording improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-11056) Fix warning message and rename old APIs

2020-10-13 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-11056 started by Ning Kang.

> Fix warning message and rename old APIs
> ---
>
> Key: BEAM-11056
> URL: https://issues.apache.org/jira/browse/BEAM-11056
> Project: Beam
>  Issue Type: Bug
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When invoking `ib.evict_captured_data()`, the logging contains a typo 
> `recordeddata`:
> `You have requested Interactive Beam to evict all recordeddata that could be 
> deterministically replayed among multiple pipeline runs.`
> Also, the `capture_control` should be renamed to `record_control`. All 
> occurrences of `capture` should be changed to `record` to keep the 
> consistency of large source recording improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11056) Fix warning message and rename old APIs

2020-10-12 Thread Ning Kang (Jira)
Ning Kang created BEAM-11056:


 Summary: Fix warning message and rename old APIs
 Key: BEAM-11056
 URL: https://issues.apache.org/jira/browse/BEAM-11056
 Project: Beam
  Issue Type: Bug
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


When invoking `ib.evict_captured_data()`, the logging contains a typo 
`recordeddata`:
`You have requested Interactive Beam to evict all recordeddata that could be 
deterministically replayed among multiple pipeline runs.`

Also, the `capture_control` should be renamed to `record_control`. All 
occurrences of `capture` should be changed to `record` to keep the consistency 
of large source recording improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11039) Resolve conflicts between TFMA and Facets imports

2020-10-08 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11039:
-
Status: Resolved  (was: In Progress)

> Resolve conflicts between TFMA and Facets imports
> -
>
> Key: BEAM-11039
> URL: https://issues.apache.org/jira/browse/BEAM-11039
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In a trusted notebook, Facets will load automatically, and declare extra 
> variables in the global scope 
> (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side 
> effects if other libraries try to declare the same variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-11039) Resolve conflicts between TFMA and Facets imports

2020-10-08 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-11039 started by Ning Kang.

> Resolve conflicts between TFMA and Facets imports
> -
>
> Key: BEAM-11039
> URL: https://issues.apache.org/jira/browse/BEAM-11039
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In a trusted notebook, Facets will load automatically, and declare extra 
> variables in the global scope 
> (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side 
> effects if other libraries try to declare the same variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11045) Screen diff integration tests fail

2020-10-08 Thread Ning Kang (Jira)
Ning Kang created BEAM-11045:


 Summary: Screen diff integration tests fail
 Key: BEAM-11045
 URL: https://issues.apache.org/jira/browse/BEAM-11045
 Project: Beam
  Issue Type: Test
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


The tests still function with Python 3.7.
Golden screenshots and chromedriver-binary need to be updated because the 
version advancement of newest chrome browser.

The tests start failing with Python 3.8 due to the incompatibility in `Process` 
from `multiprocessing` module.
A process is used to start an HTTP server to serve notebook results in headless 
browser for taking screenshots. 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-11039) Resolve conflicts between TFMA and Facets imports

2020-10-07 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209894#comment-17209894
 ] 

Ning Kang commented on BEAM-11039:
--

I had a discussion with James from Facets side and decided to enclose the HTML 
templates in interactive Beam with iframe to avoid conflict with TFMA.

This also resolves the mis-renderings in JupyterLab:
* Weird shaped widgets when HTML import happens in the same notebook cell that 
renders the facets widgets.
* Empty display except the first rendering when multiple cells render facets 
widgets if import HTML in each notebook output area.

> Resolve conflicts between TFMA and Facets imports
> -
>
> Key: BEAM-11039
> URL: https://issues.apache.org/jira/browse/BEAM-11039
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> In a trusted notebook, Facets will load automatically, and declare extra 
> variables in the global scope 
> (https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side 
> effects if other libraries try to declare the same variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11039) Resolve conflicts between TFMA and Facets imports

2020-10-07 Thread Ning Kang (Jira)
Ning Kang created BEAM-11039:


 Summary: Resolve conflicts between TFMA and Facets imports
 Key: BEAM-11039
 URL: https://issues.apache.org/jira/browse/BEAM-11039
 Project: Beam
  Issue Type: Test
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


In a trusted notebook, Facets will load automatically, and declare extra 
variables in the global scope 
(https://screenshot.googleplex.com/9J7JGH8xVUjtfqa). This can have side effects 
if other libraries try to declare the same variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-11025:
-
Status: Resolved  (was: In Progress)

> PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in 
> precommit
> -
>
> Key: BEAM-11025
> URL: https://issues.apache.org/jira/browse/BEAM-11025
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/
> Error Message
> AssertionError: None is not an instance of 
> Stacktrace
> self = 
>   testMethod=test_dynamic_plotting_return_handle>
> def test_dynamic_plotting_return_handle(self):
>   h = pv.visualize(
>   self._stream, dynamic_plotting_interval=1, display_facets=True)
> > self.assertIsInstance(h, timeloop.Timeloop)
> E AssertionError: None is not an instance of  'timeloop.app.Timeloop'>
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: 
> AssertionError
> Standard Output
> 
> 
>0
> 0  0
> 1  1
> 2  2
> 3  3
> 4  4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-11025 started by Ning Kang.

> PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in 
> precommit
> -
>
> Key: BEAM-11025
> URL: https://issues.apache.org/jira/browse/BEAM-11025
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/
> Error Message
> AssertionError: None is not an instance of 
> Stacktrace
> self = 
>   testMethod=test_dynamic_plotting_return_handle>
> def test_dynamic_plotting_return_handle(self):
>   h = pv.visualize(
>   self._stream, dynamic_plotting_interval=1, display_facets=True)
> > self.assertIsInstance(h, timeloop.Timeloop)
> E AssertionError: None is not an instance of  'timeloop.app.Timeloop'>
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: 
> AssertionError
> Standard Output
> 
> 
>0
> 0  0
> 1  1
> 2  2
> 3  3
> 4  4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209037#comment-17209037
 ] 

Ning Kang commented on BEAM-11025:
--

Based on the standard output, it looks like the None is returned by one of the 
failure paths when the environment is considered not in a notebook.

There could be a race condition when tests are running in parallel and 
modifying the global instance ib.current_env().
In that case, a patch in the test to use a mocked `is_in_notebook` check should 
be added.

> PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in 
> precommit
> -
>
> Key: BEAM-11025
> URL: https://issues.apache.org/jira/browse/BEAM-11025
> Project: Beam
>  Issue Type: Test
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/
> Error Message
> AssertionError: None is not an instance of 
> Stacktrace
> self = 
>   testMethod=test_dynamic_plotting_return_handle>
> def test_dynamic_plotting_return_handle(self):
>   h = pv.visualize(
>   self._stream, dynamic_plotting_interval=1, display_facets=True)
> > self.assertIsInstance(h, timeloop.Timeloop)
> E AssertionError: None is not an instance of  'timeloop.app.Timeloop'>
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: 
> AssertionError
> Standard Output
> 
> 
>0
> 0  0
> 1  1
> 2  2
> 3  3
> 4  4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-11025) PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Ning Kang (Jira)
Ning Kang created BEAM-11025:


 Summary: 
PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in 
precommit
 Key: BEAM-11025
 URL: https://issues.apache.org/jira/browse/BEAM-11025
 Project: Beam
  Issue Type: Test
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/

Error Message
AssertionError: None is not an instance of 
Stacktrace
self = 


def test_dynamic_plotting_return_handle(self):
  h = pv.visualize(
  self._stream, dynamic_plotting_interval=1, display_facets=True)
> self.assertIsInstance(h, timeloop.Timeloop)
E AssertionError: None is not an instance of 

apache_beam/runners/interactive/display/pcoll_visualization_test.py:93: 
AssertionError
Standard Output


   0
0  0
1  1
2  2
3  3
4  4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10996) AssertionError: (10, ) when writing TF Records

2020-10-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10996:
-
Status: Open  (was: Triage Needed)

> AssertionError: (10, ) when writing TF Records
> ---
>
> Key: BEAM-10996
> URL: https://issues.apache.org/jira/browse/BEAM-10996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.22.0
>Reporter: Valliappa Lakshmanan
>Priority: P1
> Attachments: repro.py
>
>
> This code snippet:
>  
> def create_tfrecord(x):
>   size = np.array([2.0, 3.0])
>   tfexample = tf.train.Example( 
>      features=tf.train.Features(
>         feature={
>            'size': tf.train.Feature(float_list=tf.train.FloatList(value=size))
>    }))
>   return tfexample.SerializeToString()
>  
> ...
> beam.FlatMap(lambda x: create_tfrecord(x))
> ...
>  
> throws this error:
>  
> Traceback (most recent call last): File "apache_beam/runners/common.py", line 
> 961, in apache_beam.runners.common.DoFnRunner.process File 
> "apache_beam/runners/common.py", line 726, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process File 
> "apache_beam/runners/common.py", line 814, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, 
> in process self.writer.write(element) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 420, in write self.sink.write_record(self.temp_handle, value) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 146, in write_record self.write_encoded_record(file_handle, 
> self.coder.encode(value)) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 
> 463, in encode return self.get_impl().encode(value) File 
> "apache_beam/coders/coder_impl.py", line 494, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode File 
> "apache_beam/coders/coder_impl.py", line 495, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, 
> )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10996) AssertionError: (10, ) when writing TF Records

2020-10-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10996:
-
Status: Triage Needed  (was: Open)

> AssertionError: (10, ) when writing TF Records
> ---
>
> Key: BEAM-10996
> URL: https://issues.apache.org/jira/browse/BEAM-10996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.22.0
>Reporter: Valliappa Lakshmanan
>Priority: P1
> Attachments: repro.py
>
>
> This code snippet:
>  
> def create_tfrecord(x):
>   size = np.array([2.0, 3.0])
>   tfexample = tf.train.Example( 
>      features=tf.train.Features(
>         feature={
>            'size': tf.train.Feature(float_list=tf.train.FloatList(value=size))
>    }))
>   return tfexample.SerializeToString()
>  
> ...
> beam.FlatMap(lambda x: create_tfrecord(x))
> ...
>  
> throws this error:
>  
> Traceback (most recent call last): File "apache_beam/runners/common.py", line 
> 961, in apache_beam.runners.common.DoFnRunner.process File 
> "apache_beam/runners/common.py", line 726, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process File 
> "apache_beam/runners/common.py", line 814, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, 
> in process self.writer.write(element) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 420, in write self.sink.write_record(self.temp_handle, value) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 146, in write_record self.write_encoded_record(file_handle, 
> self.coder.encode(value)) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 
> 463, in encode return self.get_impl().encode(value) File 
> "apache_beam/coders/coder_impl.py", line 494, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode File 
> "apache_beam/coders/coder_impl.py", line 495, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, 
> )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10996) AssertionError: (10, ) when writing TF Records

2020-10-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-10996:


Assignee: (was: Ning Kang)

> AssertionError: (10, ) when writing TF Records
> ---
>
> Key: BEAM-10996
> URL: https://issues.apache.org/jira/browse/BEAM-10996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.22.0
>Reporter: Valliappa Lakshmanan
>Priority: P1
> Attachments: repro.py
>
>
> This code snippet:
>  
> def create_tfrecord(x):
>   size = np.array([2.0, 3.0])
>   tfexample = tf.train.Example( 
>      features=tf.train.Features(
>         feature={
>            'size': tf.train.Feature(float_list=tf.train.FloatList(value=size))
>    }))
>   return tfexample.SerializeToString()
>  
> ...
> beam.FlatMap(lambda x: create_tfrecord(x))
> ...
>  
> throws this error:
>  
> Traceback (most recent call last): File "apache_beam/runners/common.py", line 
> 961, in apache_beam.runners.common.DoFnRunner.process File 
> "apache_beam/runners/common.py", line 726, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process File 
> "apache_beam/runners/common.py", line 814, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, 
> in process self.writer.write(element) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 420, in write self.sink.write_record(self.temp_handle, value) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 146, in write_record self.write_encoded_record(file_handle, 
> self.coder.encode(value)) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 
> 463, in encode return self.get_impl().encode(value) File 
> "apache_beam/coders/coder_impl.py", line 494, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode File 
> "apache_beam/coders/coder_impl.py", line 495, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, 
> )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10996) AssertionError: (10, ) when writing TF Records

2020-10-05 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208270#comment-17208270
 ] 

Ning Kang commented on BEAM-10996:
--

First, I have to clarify this is not related to the notebook, the pipeline 
fails no matter where it runs.
Second, there are 2 problems with the pipeline:
When using `beam.FlatMap`, the 3 serialized strings are flattened into dozens 
of integers. Thus you have the assertion error because the elements associated 
with the PCollection are now of type `int`. You should use `beam.Map` instead 
of `beam.FlatMap`. If this is intended, you have to convert the int elements 
into strings such as appending another transform `beam.Map(lambda x: 
str(x))`.The SerializeToString is not decodable by Beam. You can enclose it 
with `return base64.b64encode(tfexample.SerializeToString()).decode('utf-8')`.

> AssertionError: (10, ) when writing TF Records
> ---
>
> Key: BEAM-10996
> URL: https://issues.apache.org/jira/browse/BEAM-10996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.22.0
>Reporter: Valliappa Lakshmanan
>Assignee: Ning Kang
>Priority: P1
> Attachments: repro.py
>
>
> This code snippet:
>  
> def create_tfrecord(x):
>   size = np.array([2.0, 3.0])
>   tfexample = tf.train.Example( 
>      features=tf.train.Features(
>         feature={
>            'size': tf.train.Feature(float_list=tf.train.FloatList(value=size))
>    }))
>   return tfexample.SerializeToString()
>  
> ...
> beam.FlatMap(lambda x: create_tfrecord(x))
> ...
>  
> throws this error:
>  
> Traceback (most recent call last): File "apache_beam/runners/common.py", line 
> 961, in apache_beam.runners.common.DoFnRunner.process File 
> "apache_beam/runners/common.py", line 726, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process File 
> "apache_beam/runners/common.py", line 814, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, 
> in process self.writer.write(element) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 420, in write self.sink.write_record(self.temp_handle, value) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 146, in write_record self.write_encoded_record(file_handle, 
> self.coder.encode(value)) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 
> 463, in encode return self.get_impl().encode(value) File 
> "apache_beam/coders/coder_impl.py", line 494, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode File 
> "apache_beam/coders/coder_impl.py", line 495, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, 
> )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10996) AssertionError: (10, ) when writing TF Records

2020-10-05 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reassigned BEAM-10996:


Assignee: Ning Kang

> AssertionError: (10, ) when writing TF Records
> ---
>
> Key: BEAM-10996
> URL: https://issues.apache.org/jira/browse/BEAM-10996
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.22.0
>Reporter: Valliappa Lakshmanan
>Assignee: Ning Kang
>Priority: P1
> Attachments: repro.py
>
>
> This code snippet:
>  
> def create_tfrecord(x):
>   size = np.array([2.0, 3.0])
>   tfexample = tf.train.Example( 
>      features=tf.train.Features(
>         feature={
>            'size': tf.train.Feature(float_list=tf.train.FloatList(value=size))
>    }))
>   return tfexample.SerializeToString()
>  
> ...
> beam.FlatMap(lambda x: create_tfrecord(x))
> ...
>  
> throws this error:
>  
> Traceback (most recent call last): File "apache_beam/runners/common.py", line 
> 961, in apache_beam.runners.common.DoFnRunner.process File 
> "apache_beam/runners/common.py", line 726, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process File 
> "apache_beam/runners/common.py", line 814, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1061, 
> in process self.writer.write(element) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 420, in write self.sink.write_record(self.temp_handle, value) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 146, in write_record self.write_encoded_record(file_handle, 
> self.coder.encode(value)) File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 
> 463, in encode return self.get_impl().encode(value) File 
> "apache_beam/coders/coder_impl.py", line 494, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode File 
> "apache_beam/coders/coder_impl.py", line 495, in 
> apache_beam.coders.coder_impl.BytesCoderImpl.encode AssertionError: (10, 
> )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10775) Create precomimt job for typescript tests

2020-09-01 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10775:
-
Status: Resolved  (was: In Progress)

> Create precomimt job for typescript tests
> -
>
> Key: BEAM-10775
> URL: https://issues.apache.org/jira/browse/BEAM-10775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Create a precommit job that runs jest tests and eslint checks for typescript 
> code in the Beam repo.
> Currently known typescript code is:
> the side panel extension - located under 
> sdks/python/apache_beam/runners/interactive/extensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core

2020-08-28 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10827:
-
Status: Resolved  (was: In Progress)

> google-cloud-spanner is incompatible with google-cloud-core
> ---
>
> Key: BEAM-10827
> URL: https://issues.apache.org/jira/browse/BEAM-10827
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Running into error when building a Beam notebook container
> {code:java}
> ERROR: google-cloud-spanner 1.18.0 has requirement 
> google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 
> which is incompatible.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core

2020-08-28 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10827:
-
Status: Resolved  (was: Resolved)

> google-cloud-spanner is incompatible with google-cloud-core
> ---
>
> Key: BEAM-10827
> URL: https://issues.apache.org/jira/browse/BEAM-10827
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Running into error when building a Beam notebook container
> {code:java}
> ERROR: google-cloud-spanner 1.18.0 has requirement 
> google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 
> which is incompatible.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core

2020-08-28 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186740#comment-17186740
 ] 

Ning Kang commented on BEAM-10827:
--

[~pabloem] Yes. Marked it as resolved.

> google-cloud-spanner is incompatible with google-cloud-core
> ---
>
> Key: BEAM-10827
> URL: https://issues.apache.org/jira/browse/BEAM-10827
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Running into error when building a Beam notebook container
> {code:java}
> ERROR: google-cloud-spanner 1.18.0 has requirement 
> google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 
> which is incompatible.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core

2020-08-28 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10827 started by Ning Kang.

> google-cloud-spanner is incompatible with google-cloud-core
> ---
>
> Key: BEAM-10827
> URL: https://issues.apache.org/jira/browse/BEAM-10827
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Running into error when building a Beam notebook container
> {code:java}
> ERROR: google-cloud-spanner 1.18.0 has requirement 
> google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 
> which is incompatible.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and not have conflicting dependencies

2020-08-27 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186180#comment-17186180
 ] 

Ning Kang commented on BEAM-8551:
-

The Python Docker precommit and cron job do not error out when there is a 
dependency conflict: https://screenshot.googleplex.com/AVSmsuBCBuEijQR
The behavior might change after Oct., 2020.

We'll check if we can add a pip check statement in the jobs.

> Beam Python containers should include all Beam SDK dependencies, and not have 
> conflicting dependencies
> --
>
> Key: BEAM-8551
> URL: https://issues.apache.org/jira/browse/BEAM-8551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: P1
>
> Checks could be introduced during container creation, and be enforced by 
> ValidatesContainer test suites. We could:
> - Check pip output or status code for incompatible dependency errors.
> - Remove internet access when installing apache-beam in the container, to 
> makes sure all dependencies are installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10827) google-cloud-spanner is incompatible with google-cloud-core

2020-08-27 Thread Ning Kang (Jira)
Ning Kang created BEAM-10827:


 Summary: google-cloud-spanner is incompatible with 
google-cloud-core
 Key: BEAM-10827
 URL: https://issues.apache.org/jira/browse/BEAM-10827
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Ning Kang
Assignee: Ning Kang


Running into error when building a Beam notebook container


{code:java}
ERROR: google-cloud-spanner 1.18.0 has requirement 
google-cloud-core<2.0dev,>=1.4.1, but you'll have google-cloud-core 1.1.0 which 
is incompatible.
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo

2020-08-27 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185995#comment-17185995
 ] 

Ning Kang commented on BEAM-10771:
--

[~lcwik]
Yes, it's done and 
[announced|https://lists.apache.org/thread.html/r5e8715301a2cafcd2aa1d972aa4ceb52247b6d7450cd8924fb6d1508%40%3Cdev.beam.apache.org%3E].

The [problem|https://github.com/apache/beam/pull/12689] we had yesterday was 
caused by forcefully merged code because the whitespacelint has caught the 
problem in the PR.

> Create whitespacelint precommit for non code files in Beam repo
> ---
>
> Key: BEAM-10771
> URL: https://issues.apache.org/jira/browse/BEAM-10771
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Create a precommit job that runs on changes of non code files in Beam to find 
> out trailing whitespaces and report such issues.
> Supported non code file types are:
> * *.md (markdown files)
> * build.gradle (gradle build files)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo

2020-08-26 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10771:
-
Status: Resolved  (was: In Progress)

> Create whitespacelint precommit for non code files in Beam repo
> ---
>
> Key: BEAM-10771
> URL: https://issues.apache.org/jira/browse/BEAM-10771
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Create a precommit job that runs on changes of non code files in Beam to find 
> out trailing whitespaces and report such issues.
> Supported non code file types are:
> * *.md (markdown files)
> * build.gradle (gradle build files)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo

2020-08-21 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10771 started by Ning Kang.

> Create whitespacelint precommit for non code files in Beam repo
> ---
>
> Key: BEAM-10771
> URL: https://issues.apache.org/jira/browse/BEAM-10771
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Create a precommit job that runs on changes of non code files in Beam to find 
> out trailing whitespaces and report such issues.
> Supported non code file types are:
> * *.md (markdown files)
> * build.gradle (gradle build files)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10764) Make is_in_ipython robust

2020-08-21 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10764 started by Ning Kang.

> Make is_in_ipython robust
> -
>
> Key: BEAM-10764
> URL: https://issues.apache.org/jira/browse/BEAM-10764
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> `is_in_ipython` determines if current code execution is within an IPython 
> environment by attempting to fetch an IPython kernel through 
> `IPython.get_ipython()`.
> If IPython dependency is not available or a `None` is fetched, the result 
> would be False.
> We've been seeing some users using corrupted IPython dependency in their code 
> base.
> If an IPython dependency is present but throws a non ImportError exception, 
> it will break the Beam usage.
> I assume the similar errors would happen if the user uses an IPython 
> dependency outside the range of versions in setup.py.
> I decide to make the function best effort so that it always returns False 
> when errors occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10764) Make is_in_ipython robust

2020-08-21 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10764:
-
Status: Resolved  (was: In Progress)

> Make is_in_ipython robust
> -
>
> Key: BEAM-10764
> URL: https://issues.apache.org/jira/browse/BEAM-10764
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> `is_in_ipython` determines if current code execution is within an IPython 
> environment by attempting to fetch an IPython kernel through 
> `IPython.get_ipython()`.
> If IPython dependency is not available or a `None` is fetched, the result 
> would be False.
> We've been seeing some users using corrupted IPython dependency in their code 
> base.
> If an IPython dependency is present but throws a non ImportError exception, 
> it will break the Beam usage.
> I assume the similar errors would happen if the user uses an IPython 
> dependency outside the range of versions in setup.py.
> I decide to make the function best effort so that it always returns False 
> when errors occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10775) Create precomimt job for typescript tests

2020-08-20 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10775:
-
Description: 
Create a precommit job that runs jest tests and eslint checks for typescript 
code in the Beam repo.

Currently known typescript code is:
the side panel extension - located under 
sdks/python/apache_beam/runners/interactive/extensions

  was:Create a precommit job that runs jest tests for the side panel extension


> Create precomimt job for typescript tests
> -
>
> Key: BEAM-10775
> URL: https://issues.apache.org/jira/browse/BEAM-10775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> Create a precommit job that runs jest tests and eslint checks for typescript 
> code in the Beam repo.
> Currently known typescript code is:
> the side panel extension - located under 
> sdks/python/apache_beam/runners/interactive/extensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10775) Create precomimt job for typescript tests

2020-08-20 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10775:
-
Summary: Create precomimt job for typescript tests  (was: Create precomimt 
job for jest tests of side panel extension)

> Create precomimt job for typescript tests
> -
>
> Key: BEAM-10775
> URL: https://issues.apache.org/jira/browse/BEAM-10775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> Create a precommit job that runs jest tests for the side panel extension



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10775) Create precomimt job for jest tests of side panel extension

2020-08-20 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10775 started by Ning Kang.

> Create precomimt job for jest tests of side panel extension
> ---
>
> Key: BEAM-10775
> URL: https://issues.apache.org/jira/browse/BEAM-10775
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>
> Create a precommit job that runs jest tests for the side panel extension



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10775) Create precomimt job for jest tests of side panel extension

2020-08-20 Thread Ning Kang (Jira)
Ning Kang created BEAM-10775:


 Summary: Create precomimt job for jest tests of side panel 
extension
 Key: BEAM-10775
 URL: https://issues.apache.org/jira/browse/BEAM-10775
 Project: Beam
  Issue Type: Improvement
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


Create a precommit job that runs jest tests for the side panel extension



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10771) Create whitespacelint precommit for non code files in Beam repo

2020-08-19 Thread Ning Kang (Jira)
Ning Kang created BEAM-10771:


 Summary: Create whitespacelint precommit for non code files in 
Beam repo
 Key: BEAM-10771
 URL: https://issues.apache.org/jira/browse/BEAM-10771
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Ning Kang
Assignee: Ning Kang


Create a precommit job that runs on changes of non code files in Beam to find 
out trailing whitespaces and report such issues.

Supported non code file types are:
* *.md (markdown files)
* build.gradle (gradle build files)





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10764) Make is_in_ipython robust

2020-08-19 Thread Ning Kang (Jira)
Ning Kang created BEAM-10764:


 Summary: Make is_in_ipython robust
 Key: BEAM-10764
 URL: https://issues.apache.org/jira/browse/BEAM-10764
 Project: Beam
  Issue Type: Improvement
  Components: runner-py-interactive
Reporter: Ning Kang
Assignee: Ning Kang


`is_in_ipython` determines if current code execution is within an IPython 
environment by attempting to fetch an IPython kernel through 
`IPython.get_ipython()`.

If IPython dependency is not available or a `None` is fetched, the result would 
be False.

We've been seeing some users using corrupted IPython dependency in their code 
base.
If an IPython dependency is present but throws a non ImportError exception, it 
will break the Beam usage.
I assume the similar errors would happen if the user uses an IPython dependency 
outside the range of versions in setup.py.

I decide to make the function best effort so that it always returns False when 
errors occur.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10637 stopped by Ning Kang.

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10637:
-
Status: Resolved  (was: Open)

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10637 started by Ning Kang.

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10637:
-
Status: Open  (was: Triage Needed)

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang reopened BEAM-10637:
--

Try to make the resolution resolved.

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10627) tests fails on windows - interactive tests fails due to FileNotFoundError

2020-08-11 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175728#comment-17175728
 ] 

Ning Kang commented on BEAM-10627:
--

[~TobKed] I don't think there is any different in term of these tests in Py3.5 
and Py3.7.
But many interactive features require Py3.6 and + to work. So it's okay to skip 
all these tests for Py3.5.

> tests fails on windows - interactive tests fails due to FileNotFoundError
> -
>
> Key: BEAM-10627
> URL: https://issues.apache.org/jira/browse/BEAM-10627
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Tobiasz Kedzierski
>Assignee: Ning Kang
>Priority: P2
> Attachments: BEAM-10627.txt
>
>
> Failing tests:
> apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest.test_basic
> apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest.test_wordcount
> apache_beam.runners.interactive.interactive_beam_test.InteractiveBeamTest.test_show_always_watch_given_pcolls
> apache_beam.runners.interactive.interactive_beam_test.InteractiveBeamTest.test_show_mark_pcolls_computed_when_done
> Link to the github workflow run with mentioned error:
> [https://github.com/TobKed/beam/runs/937336438?check_suite_focus=true]
> partial log:
> 2020-08-02T11:05:43.5852779Z ___ 
> InteractiveBeamTest.test_show_always_watch_given_pcolls ___
> 2020-08-02T11:05:43.5853476Z [gw3] win32 -- Python 3.5.4 
> d:\a\beam\beam\sdks\python\target\.tox\py35-win\scripts\python.exe
> 2020-08-02T11:05:43.5853847Z 
> 2020-08-02T11:05:43.5855313Z self = 
>  testMethod=test_show_always_watch_given_pcolls>
> 2020-08-02T11:05:43.5855658Z 
> 2020-08-02T11:05:43.5855975Z def 
> test_show_always_watch_given_pcolls(self):
> 2020-08-02T11:05:43.5856278Z   p = beam.Pipeline(ir.InteractiveRunner())
> 2020-08-02T11:05:43.5856566Z   # pylint: 
> disable=range-builtin-not-iterating
> 2020-08-02T11:05:43.5856845Z   pcoll = p | 'Create' >> 
> beam.Create(range(10))
> 2020-08-02T11:05:43.5857355Z   # The pcoll is not watched since 
> watch(locals()) is not explicitly called.
> 2020-08-02T11:05:43.5858106Z   self.assertFalse(pcoll in 
> _get_watched_pcollections_with_variable_names())
> 2020-08-02T11:05:43.5858620Z   # The call of show watches pcoll.
> 2020-08-02T11:05:43.5859235Z > ib.show(pcoll)
> 2020-08-02T11:05:43.5859475Z 
> 2020-08-02T11:05:43.5860015Z 
> apache_beam\runners\interactive\interactive_beam_test.py:96: 
> 2020-08-02T11:05:43.5861024Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> 2020-08-02T11:05:43.5861944Z apache_beam\runners\interactive\utils.py:205: in 
> run_within_progress_indicator
> 2020-08-02T11:05:43.5862682Z return func(*args, **kwargs)
> 2020-08-02T11:05:43.5863214Z 
> apache_beam\runners\interactive\interactive_beam.py:411: in show
> 2020-08-02T11:05:43.5863760Z result = pf.PipelineFragment(list(pcolls), 
> user_pipeline.options).run()
> 2020-08-02T11:05:43.5864291Z 
> apache_beam\runners\interactive\pipeline_fragment.py:113: in run
> 2020-08-02T11:05:43.5864746Z return self.deduce_fragment().run()
> 2020-08-02T11:05:43.5865292Z apache_beam\pipeline.py:521: in run
> 2020-08-02T11:05:43.5865633Z allow_proto_holders=True).run(False)
> 2020-08-02T11:05:43.5866159Z apache_beam\pipeline.py:534: in run
> 2020-08-02T11:05:43.5866638Z return self.runner.run_pipeline(self, 
> self._options)
> 2020-08-02T11:05:43.5867299Z 
> apache_beam\runners\interactive\interactive_runner.py:194: in run_pipeline
> 2020-08-02T11:05:43.5867667Z pipeline_to_execute.run(), 
> pipeline_instrument)
> 2020-08-02T11:05:43.5868119Z apache_beam\pipeline.py:534: in run
> 2020-08-02T11:05:43.5868627Z return self.runner.run_pipeline(self, 
> self._options)
> 2020-08-02T11:05:43.5869401Z apache_beam\runners\direct\direct_runner.py:119: 
> in run_pipeline
> 2020-08-02T11:05:43.5869735Z return runner.run_pipeline(pipeline, options)
> 2020-08-02T11:05:43.5870201Z 
> apache_beam\runners\portability\fn_api_runner\fn_runner.py:176: in 
> run_pipeline
> 2020-08-02T11:05:43.5870665Z 
> pipeline.to_runner_api(default_environment=self._default_environment))
> 2020-08-02T11:05:43.5871520Z 
> apache_beam\runners\portability\fn_api_runner\fn_runner.py:186: in 
> run_via_runner_api
> 2020-08-02T11:05:43.5871987Z return self.run_stages(stage_context, stages)
> 2020-08-02T11:05:43.5872612Z 
> apache_beam\runners\portability\fn_api_runner\fn_runner.py:344: in run_stages
> 2020-08-02T11:05:43.5872918Z bundle_context_manager,
> 2020-08-02T11:05:43.5873512Z 
> apache_beam\runners\portability\fn_api_runner\fn_runner.py:523: in _run_stage
> 2020-08-02T11:05:43.5873851Z bundle_mana

[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10637:
-
Status: Resolved  (was: Resolved)

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10637) Fix start/stop hanging issue on test stream service controller

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10637:
-
Status: Resolved  (was: Open)

> Fix start/stop hanging issue on test stream service controller
> --
>
> Key: BEAM-10637
> URL: https://issues.apache.org/jira/browse/BEAM-10637
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test stream controller sometimes hangs forever when the underlying grpc 
> server start or stop in the wrong timing:
> E.g., try to stop a server never started, stop a server already stopped, 
> start a server already started and etc.
> The change is to add start/stop states to handle all 6 combinations:
> # start server not started
> # start server started
> # start server stopped
> # stop server not started
> # stop server started
> # stop server stopped
> So that it never hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10635) Incompatible google-api-core and google-cloud-bigquery

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10635:
-
Status: Resolved  (was: Resolved)

> Incompatible google-api-core and google-cloud-bigquery
> --
>
> Key: BEAM-10635
> URL: https://issues.apache.org/jira/browse/BEAM-10635
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P0
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This bigquery version bump: 
> https://github.com/apache/beam/commit/a315672dbe2b82e013e8120ac5398c623c7b13a7
> seems to have broken the pip check:
> google-cloud-bigquery 1.26.1 has requirement google-api-core<2.0dev,>=1.21.0, 
> but you have google-api-core 1.20.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10635) Incompatible google-api-core and google-cloud-bigquery

2020-08-11 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-10635:
-
Status: Resolved  (was: Triage Needed)

> Incompatible google-api-core and google-cloud-bigquery
> --
>
> Key: BEAM-10635
> URL: https://issues.apache.org/jira/browse/BEAM-10635
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P0
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This bigquery version bump: 
> https://github.com/apache/beam/commit/a315672dbe2b82e013e8120ac5398c623c7b13a7
> seems to have broken the pip check:
> google-cloud-bigquery 1.26.1 has requirement google-api-core<2.0dev,>=1.21.0, 
> but you have google-api-core 1.20.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >