[jira] [Work logged] (BEAM-8617) Tear down the DoFns upon the control service termination in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8617?focusedWorklogId=347972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347972 ] ASF GitHub Bot logged work on BEAM-8617: Author: ASF GitHub Bot Created on: 22/Nov/19 07:00 Start Date: 22/Nov/19 07:00 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10073: [BEAM-8617] Tear down the DoFns upon the control service termination in Python SDK harness. URL: https://github.com/apache/beam/pull/10073#issuecomment-557416225 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347972) Time Spent: 40m (was: 0.5h) > Tear down the DoFns upon the control service termination in Python SDK harness > -- > > Key: BEAM-8617 > URL: https://issues.apache.org/jira/browse/BEAM-8617 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.18.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Per the discussion in the ML can be found [1], the teardown of DoFns should > be supported in the portability framework. It happens at two places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support to teardown the DoFns upon the control > service termination for Python SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:12 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing: https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]. h2. testPy35Cython failure {noformat} ./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython ... x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m -c apache_beam/coders/stream.c -o build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or directory #include "Python.h" ^~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ... FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'. {noformat} Installed {{sudo apt-get install python3-dev}}. It didn't work. was (Author: suztomo): h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFU
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:10 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing: https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]. h2. testPy35Cython failure {noformat} ./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython ... x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m -c apache_beam/coders/stream.c -o build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or directory #include "Python.h" ^~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ... FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'. {noformat} Installed {{sudo apt-get install python3-dev}} was (Author: suztomo): h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:09 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing: https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]. h2. testPy35Cython failure {noformat} ./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython ... x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m -c apache_beam/coders/stream.c -o build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or directory #include "Python.h" ^~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 ... FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'. {noformat} was (Author: suztomo): h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:06 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing: https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]. :sdks:python:test-suites:tox:py35:testPy35Cython {noformat} FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'. {noformat} was (Author: suztomo): h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing: https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]. > Python setup issues > --- > > Key: BEAM-8787 > URL: https://issues.apache.org/jira/browse/BEAM-8787 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: debian x86 (gLinux) >Reporter: Elliotte Rusty Harold >Priority: Major > > This could be an issue with incomplete or inaccurate contributing docs. tldr;
[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker
[ https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347918 ] ASF GitHub Bot logged work on BEAM-8746: Author: ASF GitHub Bot Created on: 22/Nov/19 05:00 Start Date: 22/Nov/19 05:00 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10161: [BEAM-8746] Make local job service accessible from external machines URL: https://github.com/apache/beam/pull/10161#discussion_r349433786 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, pipeline, options): def start_grpc_server(self, port=0): self._server = grpc.server(UnboundedThreadPoolExecutor()) -port = self._server.add_insecure_port('localhost:%d' % port) +port = self._server.add_insecure_port('[::]:%d' % port) Review comment: ok, I've been playing around with this, and the main impediment to keeping this design simple is that there are two separate hostnames required, one for opening the port for the server, and one which is delivered to the client for reconnecting to the staging service. I think it'd be nice to prevent people from having to figure out all this stuff again, because it's pretty frustrating to get it right, so here's my best effort at a compromise between making this configurable and making it "just work". ```python def get_hostname(self): """Return the host name at which this server will be accessible. In particular, this is provided to the client as the artifact staging endpoint. """ return 'localhost' def start_grpc_server(self, port=0): self._server = grpc.server(UnboundedThreadPoolExecutor()) hostname = self.get_hostname() # either open this up to the world, or lock it down to localhost if os.environ.get('DOCKER_MAC_CONTAINER') == '1' or hostname != 'localhost': service_address = '[::]' else: service_address = 'localhost' port = self._server.add_insecure_port('%s:%d' % (service_address, port)) beam_job_api_pb2_grpc.add_JobServiceServicer_to_server(self, self._server) beam_artifact_api_pb2_grpc.add_ArtifactStagingServiceServicer_to_server( self._artifact_service, self._server) self._artifact_staging_endpoint = endpoints_pb2.ApiServiceDescriptor( url='%s:%d' % (hostname, port)) self._server.start() _LOGGER.info('Grpc server started at %s on port %d' % (hostname, port)) return port ``` What do you think? The other option is that I just copy all of `start_grpc_server` into my sub-class. It's not the end of the world if that's the decision we come to. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347918) Time Spent: 1.5h (was: 1h 20m) > Allow the local job service to work from inside docker > -- > > Key: BEAM-8746 > URL: https://issues.apache.org/jira/browse/BEAM-8746 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the connection is refused. It's a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies
[ https://issues.apache.org/jira/browse/BEAM-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979820#comment-16979820 ] Kenneth Knowles commented on BEAM-8747: --- Nice work! Fully qualified names definitely circumvent my {{grep}} analysis. > Remove Unused non-vendored Guava compile dependencies > - > > Key: BEAM-8747 > URL: https://issues.apache.org/jira/browse/BEAM-8747 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Guava used as fully-qualified class name.png > > Time Spent: 2h > Remaining Estimate: 0h > > [~kenn] says: > BeamModulePlugin just contains lists of versions to ease coordination across > Beam modules, but mostly does not create dependencies. Most of Beam's modules > only depend on a few things there. For example Guava is not a core > dependency, but here is where it is actually depended upon: > $ find . -name build.gradle | xargs grep library.java.guava > ./sdks/java/core/build.gradle: shadowTest library.java.guava_testlib > ./sdks/java/extensions/sql/jdbc/build.gradle: compile library.java.guava > ./sdks/java/io/google-cloud-platform/build.gradle: compile library.java.guava > ./sdks/java/io/kinesis/build.gradle: testCompile library.java.guava_testlib > These results appear to be misleading. Grepping for 'import > com.google.common', I see this as the actual state of things: > - GCP connector does not appear to actually depend on Guava in compile scope > - The Beam SQL JDBC driver does not appear to actually depend on Guava in > compile scope > - The Dataflow Java worker does depend on Guava at compile scope but has > incorrect dependencies (and it probably shouldn't) > - KinesisIO does depend on Guava at compile scope but has incorrect > dependencies (Kinesis libs have Guava on API surface so it is OK here, but > should be correctly declared) > - ZetaSQL translator does depend on Guava at compile scope but has incorrect > dependencies (ZetaSQL has it on API surface so it is OK here, but should be > correctly declared) > We used to have an analysis that prevented this class of error. > Once the errors are fixed, the guava_version is simply a version that we have > discovered that seems to work for both Kinesis and ZetaSQL, libraries we do > not control. Kinesis producer is built against 18.0. Kinesis client against > 26.0-jre. ZetaSQL against 26.0-android. > (or maybe I messed up in my analysis) > Kenn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347897 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 22/Nov/19 03:17 Start Date: 22/Nov/19 03:17 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #10175: [BEAM-7390] Add code snippet for Max URL: https://github.com/apache/beam/pull/10175#discussion_r349418508 ## File path: sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py ## @@ -0,0 +1,60 @@ +# coding=utf-8 Review comment: Yes, these are for the [transform catalog](https://beam.apache.org/documentation/transforms/python/overview/), so even if it is very similar content, it is useful to see examples for every single transform and how to use them in different circumstances. We already have the [element-wise transforms](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise) completed with their docs, code samples and interactive notebooks. ([example](https://beam.apache.org/documentation/transforms/python/elementwise/map/)) We are organizing them as one file per transform, for consistency with how the docs are organized. I was going to add reviewers after tests passed since Ahmet is OOO This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347897) Time Spent: 4h 20m (was: 4h 10m) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 4h 20m > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 3:11 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for Python3.6's _bz package suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} h2. testPy36Gcp failrue The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing: https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558]. was (Author: suztomo): h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails. https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558] apt-get install libbz2-dev > Python setup issues > --- > > Key: BEAM-8787 > URL: https://issues.apache.org/jira/browse/BEAM-8787 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: debian x86 (gLinux) >Reporter: Elliotte Rusty Harold >Priority: Major > > This could be an issue with incomplete or inaccurate contributing docs. tldr; > `./gradlew check` fails on Debian after initial checkout. > The docs say that one should first run: > sudo apt-get install \ > openjdk-8-jdk \ > python-setuptools \ > python-pip \ > virtualenv > but even after running this pieces are missing. I'm still debugging exactly > what's missing but the sy
[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347895&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347895 ] ASF GitHub Bot logged work on BEAM-7594: Author: ASF GitHub Bot Created on: 22/Nov/19 03:10 Start Date: 22/Nov/19 03:10 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10194: [BEAM-7594] Fix flaky filename generation URL: https://github.com/apache/beam/pull/10194#discussion_r349417305 ## File path: sdks/python/apache_beam/io/textio_test.py ## @@ -101,17 +100,19 @@ def write_data( return f.name, [line.decode('utf-8') for line in all_data] -def write_pattern(lines_per_file, no_data=False): +def write_pattern(lines_per_file, no_data=False, return_filenames=False): Review comment: It might be nice to change all of the uses of the function, instead of conditionally changing the return type. But I won't block the PR on this : ) LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347895) Time Spent: 40m (was: 0.5h) > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 22:05:08 else test_runner_api)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 22:05:08 self._options).run(False) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 22:05:08 return self.runner.run_pipeline(self, self._options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 22:05:08 return runner.run_pipeline(pipeline, options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 294, in run_pipeline > 22:05:08 default_environment=self._default_environment)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 301, in run_via_runner_api > 22:05:08 return self.run_stages(stage_context, stages) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 383, in run_stages > 22:05:08 stage_context.safe_coders) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 655, in _run_stage > 22:05:08 result, splits = bundle_manager.process_bundle(data_input, > data_output) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam
[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347896 ] ASF GitHub Bot logged work on BEAM-7594: Author: ASF GitHub Bot Created on: 22/Nov/19 03:10 Start Date: 22/Nov/19 03:10 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10194: [BEAM-7594] Fix flaky filename generation URL: https://github.com/apache/beam/pull/10194#issuecomment-557368681 LGTM except for the one comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347896) Time Spent: 50m (was: 40m) > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > Time Spent: 50m > Remaining Estimate: 0h > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 22:05:08 else test_runner_api)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 22:05:08 self._options).run(False) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 22:05:08 return self.runner.run_pipeline(self, self._options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 22:05:08 return runner.run_pipeline(pipeline, options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 294, in run_pipeline > 22:05:08 default_environment=self._default_environment)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 301, in run_via_runner_api > 22:05:08 return self.run_stages(stage_context, stages) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 383, in run_stages > 22:05:08 stage_context.safe_coders) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 655, in _run_stage > 22:05:08 result, splits = bundle_manager.process_bundle(data_input, > data_output) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1471, in process_bundle > 22:05:08 result_future = > self._controller.control_handler.push(process_bundle_req) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347894 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 22/Nov/19 03:07 Start Date: 22/Nov/19 03:07 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349416916 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self): label='after reshuffle') pipeline.run() + @attr('ValidatesRunner') + def test_reshuffle_preserves_timestamps(self): +pipeline = TestPipeline() + +# Create a PCollection and assign each element with a different timestamp. +before_reshuffle = (pipeline +| "Four elements" >> beam.Create([ +{'name': 'foo', 'timestamp': MIN_TIMESTAMP}, +{'name': 'foo', 'timestamp': 0}, +{'name': 'bar', 'timestamp': 33}, +{'name': 'bar', 'timestamp': MAX_TIMESTAMP}, +]) +| "With timestamp" >> beam.Map( +lambda element: beam.window.TimestampedValue( +element, element['timestamp']))) + +# For each element in a PCollection, gets the current timestamp of the +# element and reassigns the timestamp to the element. +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) + +# Reshuffle the PCollection above and assign the timestamp of an element to +# that element again. +after_reshuffle = (before_reshuffle + | "Reshuffle" >> beam.Reshuffle() + | "With timestamps again" >> beam.ParDo(AddTimestamp())) + +# Given an element, emits a string which contains the timestamp and the name +# field of the element. +class FormatWithTimestamp(beam.DoFn): Review comment: Done. Thank you! It's good to know the code can be written in this way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347894) Time Spent: 18h 20m (was: 18h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-8651: -- Description: Several Beam users reported an intermittent error which happens during unpickling in StockUnpickler.find_class. A similar error happens consistently when user's pipelines have instances of super() in their main module, and use --save_main_session, see: [BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945]. In this case the error happens only sometimes, and super() calls don't play a role. So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink and Dataflow runners. On Dataflow runner so far I have seen this in streaming pipelines only, which use portable SDK worker. Typical stack trace: {noformat} File "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1148, in _create_pardo_operation dofn_data = pickler.loads(serialized_fn) File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, in loads return dill.loads(s) File "python3.5/site-packages/dill/_dill.py", line 317, in loads return load(file, ignore) File "python3.5/site-packages/dill/_dill.py", line 305, in load obj = pik.load() File "python3.5/site-packages/dill/_dill.py", line 474, in find_class return StockUnpickler.find_class(self, module, name) AttributeError: Can't get attribute 'ClassName' on {noformat} According to Guenther from [1]: {quote} This looks exactly like a race condition that we've encountered on Python 3.7.1: There's a bug in some older 3.7.x releases that breaks the thread-safety of the unpickler, as concurrent unpickle threads can access a module before it has been fully imported. See https://bugs.python.org/issue34572 for more information. The traceback shows a Python 3.6 venv so this could be a different issue (the unpickle bug was introduced in version 3.7). If it's the same bug then upgrading to Python 3.7.3 or higher should fix that issue. One potential workaround is to ensure that all of the modules get imported during the initialization of the sdk_worker, as this bug only affects imports done by the unpickler. {quote} Opening this for visibility. Current open questions are: 1. Find a minimal example to reproduce this issue. 2. Figure out whether users are still affected by this issue on Python 3.7.3. 3. Communicate a workarounds for 3.5, 3.6 users affected by this. [1] https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E was: Several Beam users [1,2] reported an error which happens on Python 3 in StockUnpickler.find_class. So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink and Dataflow runners. On Dataflow runner so far I have seen this in streaming pipelines only, which use portable SDK worker. Typical stack trace: {noformat} File "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1148, in _create_pardo_operation dofn_data = pickler.loads(serialized_fn) File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, in loads return dill.loads(s) File "python3.5/site-packages/dill/_dill.py", line 317, in loads return load(file, ignore) File "python3.5/site-packages/dill/_dill.py", line 305, in load obj = pik.load() File "python3.5/site-packages/dill/_dill.py", line 474, in find_class return StockUnpickler.find_class(self, module, name) AttributeError: Can't get attribute 'ClassName' on {noformat} According to Guenther from [1]: {quote} This looks exactly like a race condition that we've encountered on Python 3.7.1: There's a bug in some older 3.7.x releases that breaks the thread-safety of the unpickler, as concurrent unpickle threads can access a module before it has been fully imported. See https://bugs.python.org/issue34572 for more information. The traceback shows a Python 3.6 venv so this could be a different issue (the unpickle bug was introduced in version 3.7). If it's the same bug then upgrading to
[jira] [Commented] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always
[ https://issues.apache.org/jira/browse/BEAM-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979799#comment-16979799 ] Pablo Estrada commented on BEAM-8803: - Hm This is very troublesome. I'm writing a fix to make RETRY_ALWAYS the default behavior, and throw an error when there are errors inserting rows - and produce a deadletter pcollection for other scenarios. > Default behaviour for Python BQ Streaming inserts sink should be to retry > always > > > Key: BEAM-8803 > URL: https://issues.apache.org/jira/browse/BEAM-8803 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=347893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347893 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 22/Nov/19 03:01 Start Date: 22/Nov/19 03:01 Worklog Time Spent: 10m Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#issuecomment-557366844 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347893) Time Spent: 50m (was: 40m) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7198: -- Labels: easy-fix starter (was: new) > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Minor > Labels: easy-fix, starter > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 2:56 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails. https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 It seemed that _bz2 library was missing for the Python3.6. Followed [Stackoverflow: No module named '_bz2' in python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558] apt-get install libbz2-dev was (Author: suztomo): h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails. https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 > Python setup issues > --- > > Key: BEAM-8787 > URL: https://issues.apache.org/jira/browse/BEAM-8787 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: debian x86 (gLinux) >Reporter: Elliotte Rusty Harold >Priority: Major > > This could be an issue with incomplete or inaccurate contributing docs. tldr; > `./gradlew check` fails on Debian after initial checkout. > The docs say that one should first run: > sudo apt-get install \ > openjdk-8-jdk \ > python-setuptools \ > python-pip \ > virtualenv > but even after running this pieces are missing. I'm still debugging exactly > what's missing but the symptoms look like this: > > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED > The path python3.5 (from --python=python3.5) does not exist > > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED > [ant:fmpp] Traceback (most recent call last): > [ant:fmpp] File "/usr/lib/python3/dist-packages/virtualenv.py", line
[jira] [Updated] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always
[ https://issues.apache.org/jira/browse/BEAM-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-8803: Status: Open (was: Triage Needed) > Default behaviour for Python BQ Streaming inserts sink should be to retry > always > > > Key: BEAM-8803 > URL: https://issues.apache.org/jira/browse/BEAM-8803 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always
[ https://issues.apache.org/jira/browse/BEAM-8803?focusedWorklogId=347891&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347891 ] ASF GitHub Bot logged work on BEAM-8803: Author: ASF GitHub Bot Created on: 22/Nov/19 02:55 Start Date: 22/Nov/19 02:55 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10195: [BEAM-8803] BigQuery Streaming Inserts are always retried by default. URL: https://github.com/apache/beam/pull/10195 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badg
[jira] [Commented] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979792#comment-16979792 ] Valentyn Tymofieiev commented on BEAM-7198: --- I'll unassign this issue for now, but feel free to pick it up later if you have time. > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Francesco Perera >Priority: Minor > Labels: new > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-7198: - Assignee: (was: Francesco Perera) > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Minor > Labels: new > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7198: -- Labels: new (was: ) > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Francesco Perera >Priority: Minor > Labels: new > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347888 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 22/Nov/19 02:49 Start Date: 22/Nov/19 02:49 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#issuecomment-557364094 Thanks for the review @lukecwik 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347888) Time Spent: 3.5h (was: 3h 20m) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 2:38 AM: - h1. Problem The problem for my environment was that Python3.6 was missing required module {{distutils.sysconfig}} and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails. https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 was (Author: suztomo): h1. Problem The problem for my environment was Python3.6 was missing required module and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails. https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 > Python setup issues > --- > > Key: BEAM-8787 > URL: https://issues.apache.org/jira/browse/BEAM-8787 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: debian x86 (gLinux) >Reporter: Elliotte Rusty Harold >Priority: Major > > This could be an issue with incomplete or inaccurate contributing docs. tldr; > `./gradlew check` fails on Debian after initial checkout. > The docs say that one should first run: > sudo apt-get install \ > openjdk-8-jdk \ > python-setuptools \ > python-pip \ > virtualenv > but even after running this pieces are missing. I'm still debugging exactly > what's missing but the symptoms look like this: > > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED > The path python3.5 (from --python=python3.5) does not exist > > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED > [ant:fmpp] Traceback (most recent call last): > [ant:fmpp] File "/usr/lib/python3/dist-packages/virtualenv.py", line 25, in > > [ant:fmpp] import distutils.sysconfig > [ant:fmpp] ModuleNotFoundError: No module named 'distutils.sysconfig' > ... > FAILURE: Build completed with 2 failures. > 1: Task failed with an exception. > --- > * What went wrong: > Execution fail
[jira] [Commented] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785 ] Tomo Suzuki commented on BEAM-8787: --- h1. Problem The problem for my environment was Python3.6 was missing required module and the latest python3-disutils does not support Python3.6. h1. Solution Build Python from the source: {noformat} suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 https://github.com/python/cpython.git ... suztomo@suxtomo24:/tmp$ cd cpython suztomo@suxtomo24:/tmp/cpython$ git status Not currently on any branch. nothing to commit, working tree clean suztomo@suxtomo24:/tmp/cpython$ git log -1 commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8) Author: Ned Deily Date: Sun Dec 23 16:37:14 2018 -0500 3.6.8final suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your preference ... suztomo@suxtomo24:/tmp/cpython$ make install {noformat} Add the directory to the path with "/bin" appended. In {{~/.bashrc}}: {noformat} export PATH=$HOME/local/bin:$PATH {noformat} Now disutils.sysconfig module is available for Python3.6: {noformat} suztomo@suxtomo24:/tmp/cpython$ python3.6 Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from distutils import sysconfig >>> {noformat} Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds {noformat} suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 setupVirtualenv ... > Task :sdks:python:test-suites:tox:py35:setupVirtualenv ... BUILD SUCCESSFUL in 5s {noformat} Next step: {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} fails. https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053 > Python setup issues > --- > > Key: BEAM-8787 > URL: https://issues.apache.org/jira/browse/BEAM-8787 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: debian x86 (gLinux) >Reporter: Elliotte Rusty Harold >Priority: Major > > This could be an issue with incomplete or inaccurate contributing docs. tldr; > `./gradlew check` fails on Debian after initial checkout. > The docs say that one should first run: > sudo apt-get install \ > openjdk-8-jdk \ > python-setuptools \ > python-pip \ > virtualenv > but even after running this pieces are missing. I'm still debugging exactly > what's missing but the symptoms look like this: > > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED > The path python3.5 (from --python=python3.5) does not exist > > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED > [ant:fmpp] Traceback (most recent call last): > [ant:fmpp] File "/usr/lib/python3/dist-packages/virtualenv.py", line 25, in > > [ant:fmpp] import distutils.sysconfig > [ant:fmpp] ModuleNotFoundError: No module named 'distutils.sysconfig' > ... > FAILURE: Build completed with 2 failures. > 1: Task failed with an exception. > --- > * What went wrong: > Execution failed for task ':sdks:python:test-suites:tox:py35:setupVirtualenv'. > > Process 'command 'virtualenv'' finished with non-zero exit value 3 > Indeed there is no Python 3.5 on this system: > gnome-user-share python2.6 > gnome-vfs-2.0 python2.7 > gnupg python3 > gnupg2python3.6 > gold-ld python3.7 > goobuntu-config-tools python3.8 > But nowhere in the setup docs do we say that Python 3.5 is required to build > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974802#comment-16974802 ] Valentyn Tymofieiev edited comment on BEAM-1251 at 11/22/19 2:11 AM: - We are investigating an issue BEAM-8651 where Python 3 pipelines _sometimes_ fail with pickling errors in StockUnpickler.find_class(). Positing it here for visibility since it seems to be common in certain execution scenarios. Note that this is different from [BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945], which causes _consistent_ failures in StockUnpickler.find_class, when --save_main_session is used and main module has super() calls. was (Author: tvalentyn): We are investigating an issue BEAM-8651 where Python 3 pipelines sometimes fail with pickling errors in StockUnpickler.find_class(). Positing it here for visibility since it seems to be common in certain execution scenarios. > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.11.0 > > Time Spent: 30h 10m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-5878. --- Resolution: Fixed > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Fix For: 2.18.0 > > Time Spent: 16.5h > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-5878: -- Fix Version/s: 2.18.0 > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Fix For: 2.18.0 > > Time Spent: 16.5h > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .
[ https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945 ] Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 2:01 AM: - The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. In the meantime following workarounds are available: - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session. - don't use super() in the main module. - refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43]. was (Author: tvalentyn): The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. In the meantime following workarounds are available: - don't use super() in the main module. - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session. - refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43]. > Using --save_main_session fails on Python 3 when main module has invocations > of superclass method using 'super' . > - > > Key: BEAM-6158 > URL: https://issues.apache.org/jira/browse/BEAM-6158 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Mark Liu >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > A typical manifestation of this failure, which can be observed on several > Beam examples: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main > "__main__", mod_spec) > File "/usr/lib/python3.5/runpy.py", line 85, in _run_code > exec(code, run_globals) > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", > line 164, in > run() > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", > line 158, in run > | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output)) > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > > self.run().wait_until_finish() > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", > line 1338, in wait_until_finish > (self.state, getattr(self._runner, 'last_error_msg', None)), self) > apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: > Dataflow pipeline failed. State: FAILED, Error: > > Traceback (most recent call last): > File > "/usr/local/lib/python3.5/sit
[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .
[ https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945 ] Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 1:50 AM: - The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. In the meantime following workarounds are available: - don't use super() in the main module. - [restructure the pipeline|https://stackoverflow.com/a/58845832] so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session. - refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43]. was (Author: tvalentyn): The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. In the meantime following workarounds are available: - don't use super() in the main module. - restructure the pipeline so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session. - refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43]. > Using --save_main_session fails on Python 3 when main module has invocations > of superclass method using 'super' . > - > > Key: BEAM-6158 > URL: https://issues.apache.org/jira/browse/BEAM-6158 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Mark Liu >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > A typical manifestation of this failure, which can be observed on several > Beam examples: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main > "__main__", mod_spec) > File "/usr/lib/python3.5/runpy.py", line 85, in _run_code > exec(code, run_globals) > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", > line 164, in > run() > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", > line 158, in run > | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output)) > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > > self.run().wait_until_finish() > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", > line 1338, in wait_until_finish > (self.state, getattr(self._runner, 'last_error_msg', None)), self) > apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: > Dataflow pipeline failed. State: FAILED, Error: > > Traceback (most recent call last): > File > "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.p
[jira] [Assigned] (BEAM-7952) Make the input queue of the input buffer in Python SDK Harness size limited.
[ https://issues.apache.org/jira/browse/BEAM-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng reassigned BEAM-7952: - Assignee: sunjincheng > Make the input queue of the input buffer in Python SDK Harness size limited. > > > Key: BEAM-7952 > URL: https://issues.apache.org/jira/browse/BEAM-7952 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > > At Python SDK harness, the input queue size of the input buffer in Python SDK > Harness is not size limited and also not configurable. This may become a > problem if the data production rate is more than the data consumption rate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py
[ https://issues.apache.org/jira/browse/BEAM-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979765#comment-16979765 ] sunjincheng commented on BEAM-8733: --- I want to confirm a few things with you before making changes as I'm still not quite familiar with the Beam. Per my understanding, the registration in the Java SDK harness is also asynchronous(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/BeamFnControlClient.java#L138). Have I missed something? (I am not arguing, just want to have the correct understanding) :) > The "KeyError: u'-47'" error from line 305 of sdk_worker.py > --- > > Key: BEAM-8733 > URL: https://issues.apache.org/jira/browse/BEAM-8733 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: sunjincheng >Priority: Major > Fix For: 2.18.0 > > > The issue reported by [~chamikara], error message as follows: > apache_beam/runners/worker/sdk_worker.py", line 305, in get > self.fns[bundle_descriptor_id], > KeyError: u'-47' > {code} > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) > at > org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > at > org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195) > at > org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123) > Suppressed: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93) > at > org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91) > {code} > More discussion info can be found here: > https://github.com/apache/beam/pull/10004 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py
[ https://issues.apache.org/jira/browse/BEAM-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng reassigned BEAM-8733: - Assignee: sunjincheng > The "KeyError: u'-47'" error from line 305 of sdk_worker.py > --- > > Key: BEAM-8733 > URL: https://issues.apache.org/jira/browse/BEAM-8733 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.18.0 > > > The issue reported by [~chamikara], error message as follows: > apache_beam/runners/worker/sdk_worker.py", line 305, in get > self.fns[bundle_descriptor_id], > KeyError: u'-47' > {code} > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) > at > org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) > at > org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195) > at > org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123) > Suppressed: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93) > at > org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91) > {code} > More discussion info can be found here: > https://github.com/apache/beam/pull/10004 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .
[ https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945 ] Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 1:40 AM: - The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. In the meantime following workarounds are available: - don't use super() in the main module. - restructure the pipeline so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session. - refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43]. was (Author: tvalentyn): The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after https://github.com/uqfoundation/dill/issues/300. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. In the meantime following workarounds are available: - don't use super() in the main module. - refer to superclass methods via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. > Using --save_main_session fails on Python 3 when main module has invocations > of superclass method using 'super' . > - > > Key: BEAM-6158 > URL: https://issues.apache.org/jira/browse/BEAM-6158 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Mark Liu >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > A typical manifestation of this failure, which can be observed on several > Beam examples: > {noformat} > Traceback (most recent call last): > File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main > "__main__", mod_spec) > File "/usr/lib/python3.5/runpy.py", line 85, in _run_code > exec(code, run_globals) > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", > line 164, in > run() > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", > line 158, in run > | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output)) > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > > self.run().wait_until_finish() > File > "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", > line 1338, in wait_until_finish > (self.state, getattr(self._runner, 'last_error_msg', None)), self) > apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: > Dataflow pipeline failed. State: FAILED, Error: > > Traceback (most recent call last): > File > "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line > 773, in run > self._load_main_session(self.local_staging_directory) > File > "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line > 489, in _load_main_session > > pickler.load_session(session_file) >
[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347872 ] ASF GitHub Bot logged work on BEAM-7594: Author: ASF GitHub Bot Created on: 22/Nov/19 01:35 Start Date: 22/Nov/19 01:35 Worklog Time Spent: 10m Work Description: udim commented on issue #10194: [BEAM-7594] Fix flaky filename generation URL: https://github.com/apache/beam/pull/10194#issuecomment-557348453 CC: @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347872) Time Spent: 0.5h (was: 20m) > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 22:05:08 else test_runner_api)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 22:05:08 self._options).run(False) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 22:05:08 return self.runner.run_pipeline(self, self._options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 22:05:08 return runner.run_pipeline(pipeline, options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 294, in run_pipeline > 22:05:08 default_environment=self._default_environment)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 301, in run_via_runner_api > 22:05:08 return self.run_stages(stage_context, stages) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 383, in run_stages > 22:05:08 stage_context.safe_coders) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 655, in _run_stage > 22:05:08 result, splits = bundle_manager.process_bundle(data_input, > data_output) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1471, in process_bundle > 22:05:08 result_future = > self._controller.control_handler.push(process_bundle_req) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 990, in pu
[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347871 ] ASF GitHub Bot logged work on BEAM-7594: Author: ASF GitHub Bot Created on: 22/Nov/19 01:34 Start Date: 22/Nov/19 01:34 Worklog Time Spent: 10m Work Description: udim commented on issue #10194: [BEAM-7594] Fix flaky filename generation URL: https://github.com/apache/beam/pull/10194#issuecomment-557348291 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347871) Time Spent: 20m (was: 10m) > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > Time Spent: 20m > Remaining Estimate: 0h > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 22:05:08 else test_runner_api)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 22:05:08 self._options).run(False) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 22:05:08 return self.runner.run_pipeline(self, self._options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 22:05:08 return runner.run_pipeline(pipeline, options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 294, in run_pipeline > 22:05:08 default_environment=self._default_environment)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 301, in run_via_runner_api > 22:05:08 return self.run_stages(stage_context, stages) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 383, in run_stages > 22:05:08 stage_context.safe_coders) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 655, in _run_stage > 22:05:08 result, splits = bundle_manager.process_bundle(data_input, > data_output) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1471, in process_bundle > 22:05:08 result_future = > self._controller.control_handler.push(process_bundle_req) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 990, in push >
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347869 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 22/Nov/19 01:31 Start Date: 22/Nov/19 01:31 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10185: [BEAM-8651] Cherrypick PR #10167 to the release branch. URL: https://github.com/apache/beam/pull/10185#issuecomment-557347567 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347869) Time Spent: 2h 50m (was: 2h 40m) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.17.0 > > Attachments: beam8651.py > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=347870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347870 ] ASF GitHub Bot logged work on BEAM-7594: Author: ASF GitHub Bot Created on: 22/Nov/19 01:31 Start Date: 22/Nov/19 01:31 Worklog Time Spent: 10m Work Description: udim commented on pull request #10194: [BEAM-7594] Fix flaky filename generation URL: https://github.com/apache/beam/pull/10194 See [this comment](https://issues.apache.org/jira/browse/BEAM-7594?focusedCommentId=16979737&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16979737) for description of flake. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge
[jira] [Assigned] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-7594: --- Assignee: Udi Meiri > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 22:05:08 else test_runner_api)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 22:05:08 self._options).run(False) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 22:05:08 return self.runner.run_pipeline(self, self._options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 22:05:08 return runner.run_pipeline(pipeline, options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 294, in run_pipeline > 22:05:08 default_environment=self._default_environment)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 301, in run_via_runner_api > 22:05:08 return self.run_stages(stage_context, stages) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 383, in run_stages > 22:05:08 stage_context.safe_coders) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 655, in _run_stage > 22:05:08 result, splits = bundle_manager.process_bundle(data_input, > data_output) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1471, in process_bundle > 22:05:08 result_future = > self._controller.control_handler.push(process_bundle_req) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 990, in push > 22:05:08 response = self.worker.do_instruction(request) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > 22:05:08 request.instruction_id) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > 22:05:08 bundle_processor.process_bundle(instruction_id)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py", > line 593, in process_bundle > 22:05:08 dat
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347844 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 22/Nov/19 00:57 Start Date: 22/Nov/19 00:57 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035#issuecomment-557339905 Run Python Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347844) Time Spent: 4.5h (was: 4h 20m) > Python SDK labels ontime empty panes as late > > > Key: BEAM-8581 > URL: https://issues.apache.org/jira/browse/BEAM-8581 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > The GeneralTriggerDriver does not put watermark holds on timers, leading to > the ontime empty pane being considered late data. > Fix: Add a new notion of whether a trigger has an ontime pane. If it does, > then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979737#comment-16979737 ] Udi Meiri edited comment on BEAM-7594 at 11/22/19 12:52 AM: Another failure. This time I noticed that there 2 failures in 2 different tox environments (py35 and py36): {code} 10:52:42 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/util.py", line 144, in _equal 10:52:42 'Failed assert: %r == %r' % (expected, actual)) 10:52:42 apache_beam.testing.util.BeamAssertException: Failed assert: [('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4'), ('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 'line4'), ('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 'line2'), ('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 'assert_that/Match'] {code} {code} 10:52:18 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/testing/util.py", line 144, in _equal 10:52:18 'Failed assert: %r == %r' % (expected, actual)) 10:52:18 apache_beam.testing.util.BeamAssertException: Failed assert: [('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 'line2'), ('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 'line4'), ('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 'line2'), ('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 'assert_that/Match'] {code} All the filenames above share the prefix: '/tmp/20191121183751'. The code for this prefix: {code} prefix = datetime.datetime.now().strftime("%Y%m%d%H%M%S") {code} was (Author: udim): Another failure. This time I noticed that there 2 failures in 2 different tox environments (py35 and py36): {code} 10:52:42 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/util.py", line 144, in _equal 10:52:42 'Failed assert: %r == %r' % (expected, actual)) 10:52:42 apache_beam.testing.util.BeamAssertException: Failed assert: [('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4'), ('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4')
[jira] [Commented] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979737#comment-16979737 ] Udi Meiri commented on BEAM-7594: - Another failure. This time I noticed that there 2 failures in 2 different tox environments (py35 and py36): {code} 10:52:42 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/util.py", line 144, in _equal 10:52:42 'Failed assert: %r == %r' % (expected, actual)) 10:52:42 apache_beam.testing.util.BeamAssertException: Failed assert: [('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4'), ('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 'line4'), ('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 'line2'), ('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 'assert_that/Match'] {code} {code} 10:52:18 File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py36/build/srcs/sdks/python/apache_beam/testing/util.py", line 144, in _equal 10:52:18 'Failed assert: %r == %r' % (expected, actual)) 10:52:18 apache_beam.testing.util.BeamAssertException: Failed assert: [('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 'line2'), ('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 'line4')] == [('/tmp/20191121183751l7ue7cnm', 'line0'), ('/tmp/20191121183751l7ue7cnm', 'line1'), ('/tmp/20191121183751l7ue7cnm', 'line2'), ('/tmp/20191121183751l7ue7cnm', 'line3'), ('/tmp/20191121183751l7ue7cnm', 'line4'), ('/tmp/201911211837512aj0mm6o', 'line0'), ('/tmp/201911211837512aj0mm6o', 'line1'), ('/tmp/201911211837512aj0mm6o', 'line2'), ('/tmp/201911211837512aj0mm6o', 'line3'), ('/tmp/201911211837512aj0mm6o', 'line4'), ('/tmp/20191121183751ahzccono', 'line0'), ('/tmp/20191121183751ahzccono', 'line1'), ('/tmp/20191121183751ahzccono', 'line2'), ('/tmp/20191121183751ahzccono', 'line3'), ('/tmp/20191121183751ahzccono', 'line4'), ('/tmp/20191121183751hhm9i89k', 'line0'), ('/tmp/20191121183751hhm9i89k', 'line1'), ('/tmp/20191121183751hhm9i89k', 'line2'), ('/tmp/20191121183751hhm9i89k', 'line3'), ('/tmp/20191121183751hhm9i89k', 'line4')] [while running 'assert_that/Match'] {code} Notice that all the filenames share the prefix '/tmp/20191121183751'. The code for this prefix: {code} prefix = datetime.datetime.now().strftime("%Y%m%d%H%M%S") {code} > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347837 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 22/Nov/19 00:40 Start Date: 22/Nov/19 00:40 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347837) Time Spent: 3h 10m (was: 3h) > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > Time Spent: 3h 10m > Remaining Estimate: 0h > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347835 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 22/Nov/19 00:37 Start Date: 22/Nov/19 00:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r349383237 ## File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java ## @@ -363,16 +392,43 @@ private BundleProcessor createBundleProcessor(BeamFnApi.InstructionRequest reque tearDownFunctions::add, splitListener); } -return BundleProcessor.create( -startFunctionRegistry, -finishFunctionRegistry, -tearDownFunctions, -allResiduals, -pCollectionConsumerRegistry, -metricsContainerRegistry, -stateTracker, -beamFnStateClient, -queueingClient); +return bundleProcessor; + } + + /** A cache for {@link BundleProcessor}s. */ + private static class BundleProcessorCache { + +private final Map> cachedBundleProcessors; + +BundleProcessorCache() { + this.cachedBundleProcessors = Maps.newConcurrentMap(); +} + +/** + * Get a {@link BundleProcessor} from the cache if it's available. Otherwise, create one using + * the specified bundleProcessorSupplier. + */ +BundleProcessor get( +String bundleDescriptorId, Supplier bundleProcessorSupplier) { + ConcurrentLinkedQueue bundleProcessors = + cachedBundleProcessors.computeIfAbsent( + bundleDescriptorId, descriptorId -> new ConcurrentLinkedQueue<>()); + BundleProcessor bundleProcessor = bundleProcessors.poll(); + if (bundleProcessor != null) { +return bundleProcessor; + } + + return bundleProcessorSupplier.get(); +} + +/** + * Add a {@link BundleProcessor} to cache. The {@link BundleProcessor} will be reset before + * added to the cache. Review comment: `added to the cache.` -> `being added to the cache.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347835) Time Spent: 1.5h (was: 1h 20m) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347833 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 22/Nov/19 00:37 Start Date: 22/Nov/19 00:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r349380364 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/CounterCell.java ## @@ -51,6 +51,12 @@ public CounterCell(MetricName name) { this.name = name; } + @Override + public void reset() { +dirty.afterModification(); Review comment: I don't believe reset() should make this dirty since reset() should set this to the state it is at as if it was newly constructed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347833) Time Spent: 1h 20m (was: 1h 10m) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347836 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 22/Nov/19 00:37 Start Date: 22/Nov/19 00:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r349385020 ## File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java ## @@ -429,6 +434,22 @@ void release(String bundleDescriptorId, BundleProcessor bundleProcessor) { bundleProcessor.reset(); cachedBundleProcessors.get(bundleDescriptorId).add(bundleProcessor); } + +/** Shutdown all the cached {@link BundleProcessor}s, running the tearDown() functions. */ +void shutdown() throws Exception { + for (ConcurrentLinkedQueue bundleProcessors : + cachedBundleProcessors.values()) { +for (BundleProcessor bundleProcessor : bundleProcessors) { + // Need to reverse this since we want to call teardown in topological order. Review comment: It is not necessary to reverse the list since teardown can't produce output and should not meaningfully impact other instances. Its the same with startInstance where the order doesn't matter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347836) Time Spent: 1.5h (was: 1h 20m) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347834&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347834 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 22/Nov/19 00:37 Start Date: 22/Nov/19 00:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r349381333 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GaugeCell.java ## @@ -50,6 +50,12 @@ public GaugeCell(MetricName name) { this.name = name; } + @Override + public void reset() { +dirty.afterModification(); Review comment: I don't believe reset() should make this dirty since reset() should set this to the state it is at as if it was newly constructed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347834) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347832 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 22/Nov/19 00:37 Start Date: 22/Nov/19 00:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r349380027 ## File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java ## @@ -226,6 +226,7 @@ public void testUsingUserState() throws Exception { consumers, startFunctionRegistry, finishFunctionRegistry, +new ArrayList<>()::add, Review comment: after ``` Iterables.getOnlyElement(finishFunctionRegistry.getFunctions()).run(); assertThat(mainOutputValues, empty()); ``` add something like: ``` Iterables.getOnlyElement(tearDownFunctionRegistry.getFunctions()).run(); assertThat(mainOutputValues, empty()); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347832) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347831 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 22/Nov/19 00:37 Start Date: 22/Nov/19 00:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r349380409 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DistributionCell.java ## @@ -52,6 +52,12 @@ public DistributionCell(MetricName name) { this.name = name; } + @Override + public void reset() { +dirty.afterModification(); Review comment: I don't believe reset() should make this dirty since reset() should set this to the state it is at as if it was newly constructed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347831) Time Spent: 1h 10m (was: 1h) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=347824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347824 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 22/Nov/19 00:26 Start Date: 22/Nov/19 00:26 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#issuecomment-557332997 We're so close. Just a few more. Thanks for your hard work getting through all of this! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347824) Time Spent: 27h 40m (was: 27.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 27h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347823&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347823 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 22/Nov/19 00:19 Start Date: 22/Nov/19 00:19 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] Added two unit tests in CombineTest class to test AccumulatingCombine URL: https://github.com/apache/beam/pull/10173#issuecomment-557331371 In the Java parity file, quite a few tests are very similar. Some of them are called "SimpleCombine", and some of them are called "BasicCombine", and some are called "AccumulatingCombine". Their only difference is the CombineFn they use. The Java parity: https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1448 So here in Python I renamed the tests according to the CombineFn they use: test_MeanCombineFn_combine test_MeanCombineFn_combine_empty This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347823) Time Spent: 18h 10m (was: 18h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347818 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 22/Nov/19 00:12 Start Date: 22/Nov/19 00:12 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035#discussion_r349379106 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -965,18 +1000,21 @@ class TriggerDriver(with_metaclass(ABCMeta, object)): """Breaks a series of bundle and timer firings into window (pane)s.""" @abstractmethod - def process_elements(self, state, windowed_values, output_watermark): + def process_elements(self, state, windowed_values, output_watermark, + input_watermark=MIN_TIMESTAMP): pass @abstractmethod - def process_timer(self, window_id, name, time_domain, timestamp, state): + def process_timer(self, window_id, name, time_domain, timestamp, state, +input_watermark=None): pass def process_entire_key( - self, key, windowed_values, output_watermark=MIN_TIMESTAMP): + self, key, windowed_values, input_watermark=MIN_TIMESTAMP, Review comment: Have we verified we found all callers of this code? Inserting a (default) argument before existing arguments could result in an off-by-one error for unmodified callers. Also, the ordering should be consistent with `process_elements()`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347818) Time Spent: 4h 20m (was: 4h 10m) > Python SDK labels ontime empty panes as late > > > Key: BEAM-8581 > URL: https://issues.apache.org/jira/browse/BEAM-8581 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > The GeneralTriggerDriver does not put watermark holds on timers, leading to > the ontime empty pane being considered late data. > Fix: Add a new notion of whether a trigger has an ontime pane. If it does, > then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347817&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347817 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 22/Nov/19 00:12 Start Date: 22/Nov/19 00:12 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035#discussion_r349380006 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -1036,14 +1074,17 @@ class BatchGlobalTriggerDriver(TriggerDriver): index=0, nonspeculative_index=0) - def process_elements(self, state, windowed_values, unused_output_watermark): + def process_elements(self, state, windowed_values, + unused_output_watermark=MIN_TIMESTAMP, Review comment: Let's not provide default values here, as it's unclear what they should be. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347817) Time Spent: 4h 20m (was: 4h 10m) > Python SDK labels ontime empty panes as late > > > Key: BEAM-8581 > URL: https://issues.apache.org/jira/browse/BEAM-8581 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > The GeneralTriggerDriver does not put watermark holds on timers, leading to > the ontime empty pane being considered late data. > Fix: Add a new notion of whether a trigger has an ontime pane. If it does, > then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347816 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 22/Nov/19 00:12 Start Date: 22/Nov/19 00:12 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035#discussion_r349379753 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -965,18 +1000,21 @@ class TriggerDriver(with_metaclass(ABCMeta, object)): """Breaks a series of bundle and timer firings into window (pane)s.""" @abstractmethod - def process_elements(self, state, windowed_values, output_watermark): + def process_elements(self, state, windowed_values, output_watermark, + input_watermark=MIN_TIMESTAMP): pass @abstractmethod - def process_timer(self, window_id, name, time_domain, timestamp, state): + def process_timer(self, window_id, name, time_domain, timestamp, state, +input_watermark=None): pass def process_entire_key( - self, key, windowed_values, output_watermark=MIN_TIMESTAMP): + self, key, windowed_values, input_watermark=MIN_TIMESTAMP, Review comment: Is `MIN_TIMESTAMP` the correct default? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347816) Time Spent: 4h 20m (was: 4h 10m) > Python SDK labels ontime empty panes as late > > > Key: BEAM-8581 > URL: https://issues.apache.org/jira/browse/BEAM-8581 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > The GeneralTriggerDriver does not put watermark holds on timers, leading to > the ontime empty pane being considered late data. > Fix: Add a new notion of whether a trigger has an ontime pane. If it does, > then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8804) PCollectionList support in cross-language transforms
[ https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-8804: -- Description: Currently, Beam model doesn't have any information on the order of input/output PCollections from PTransforms. Therefore, PCollectionList needs to be converted to PCollectionTuple when it goes across the cross-language boundaries (or even in the same language, whenever it is converted between in-memory object and proto) and it's impossible to recreate PCollectionList from proto with the original order. The possible workaround is just to use PCollectionTuple with integer id (starting from 0 like indexes) instead of PCollectionList. In that case, we should first well-define how we generate proto from PCollectionList since each SDK uses a different convention. (was: Currently, Beam model doesn't have any information on the order of output PCollections from PTransforms. So, PCollectionList needs to be converted to PCollectionTuple when it goes across the cross-language boundary (or even in the same language, when it is converted between in-memory object and proto).) > PCollectionList support in cross-language transforms > > > Key: BEAM-8804 > URL: https://issues.apache.org/jira/browse/BEAM-8804 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > Currently, Beam model doesn't have any information on the order of > input/output PCollections from PTransforms. Therefore, PCollectionList needs > to be converted to PCollectionTuple when it goes across the cross-language > boundaries (or even in the same language, whenever it is converted between > in-memory object and proto) and it's impossible to recreate PCollectionList > from proto with the original order. The possible workaround is just to use > PCollectionTuple with integer id (starting from 0 like indexes) instead of > PCollectionList. In that case, we should first well-define how we generate > proto from PCollectionList since each SDK uses a different convention. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347814 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 22/Nov/19 00:07 Start Date: 22/Nov/19 00:07 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10173: [BEAM-8575] Added two unit tests in CombineTest class to test AccumulatingCombine URL: https://github.com/apache/beam/pull/10173#discussion_r349379825 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +395,54 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_accumulating_combine(self): Review comment: Removed @attr('ValidatesRunner'). Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347814) Time Spent: 18h (was: 17h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347813 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 22/Nov/19 00:06 Start Date: 22/Nov/19 00:06 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10173: [BEAM-8575] Added two unit tests in CombineTest class to test AccumulatingCombine URL: https://github.com/apache/beam/pull/10173#discussion_r349379585 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +395,54 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_accumulating_combine(self): +with TestPipeline() as p: + input = (p + | beam.Create([('a', 1), + ('a', 1), + ('a', 4), + ('b', 1), + ('b', 13)])) + # The mean of all values regardless of key. + global_mean = (input + | beam.Values() + | beam.CombineGlobally(combine.MeanCombineFn())) + + # The (key, mean) pairs for all keys. + mean_per_key = (input | beam.CombinePerKey(combine.MeanCombineFn())) + + expected_mean_per_key = [('a', 2), ('b', 7)] + assert_that(global_mean, equal_to([4]), label='global mean') + assert_that(mean_per_key, equal_to(expected_mean_per_key), + label='mean per key') + + @attr('ValidatesRunner') + def test_accumulating_combine_empty(self): +# For each element in a PCollection, if it is float('NaN'), then emits +# a string 'NaN', otherwise emits str(element). +class FormatNaNDoFn(beam.DoFn): + def process(self, element): +return ([str(element)], ['NaN'])[math.isnan(element)] + +with TestPipeline() as p: + input = (p | beam.Create([])) + + # Compute the mean of all values in the PCollection, + # then format the mean. Since the Pcollection is empty, + # the mean is float('NaN'), and is formatted to be a string 'NaN'. + global_mean = (input + | beam.Values() + | beam.CombineGlobally(combine.MeanCombineFn()) + | beam.ParDo(FormatNaNDoFn())) Review comment: Good idea. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347813) Time Spent: 17h 50m (was: 17h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 17h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8805) Remove obsolete worker_threads experiment in tests
[ https://issues.apache.org/jira/browse/BEAM-8805?focusedWorklogId=347812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347812 ] ASF GitHub Bot logged work on BEAM-8805: Author: ASF GitHub Bot Created on: 22/Nov/19 00:03 Start Date: 22/Nov/19 00:03 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10193: [BEAM-8805] Remove obsolete worker_threads experiment in tests URL: https://github.com/apache/beam/pull/10193 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/j
[jira] [Created] (BEAM-8805) Remove obsolete worker_threads experiment in tests
Kyle Weaver created BEAM-8805: - Summary: Remove obsolete worker_threads experiment in tests Key: BEAM-8805 URL: https://issues.apache.org/jira/browse/BEAM-8805 Project: Beam Issue Type: Improvement Components: testing Reporter: Kyle Weaver Assignee: Kyle Weaver As of https://github.com/apache/beam/pull/10123 the worker_threads experiment is obsolete and should be removed from our test scripts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8805) Remove obsolete worker_threads experiment in tests
[ https://issues.apache.org/jira/browse/BEAM-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8805: -- Status: Open (was: Triage Needed) > Remove obsolete worker_threads experiment in tests > -- > > Key: BEAM-8805 > URL: https://issues.apache.org/jira/browse/BEAM-8805 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > > As of https://github.com/apache/beam/pull/10123 the worker_threads experiment > is obsolete and should be removed from our test scripts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky
[ https://issues.apache.org/jira/browse/BEAM-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-7594: --- Assignee: (was: Lorenzo Caggioni) > test_read_from_text_with_file_name_file_pattern is flaky > > > Key: BEAM-7594 > URL: https://issues.apache.org/jira/browse/BEAM-7594 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Valentyn Tymofieiev >Priority: Critical > Labels: currently-failing, flake > Fix For: Not applicable > > > cc: [~lcaggio] [~chamikara] > {noformat} > 22:05:08 > == > 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern > (apache_beam.io.textio_test.TextSourceTest) > 22:05:08 > -- > 22:05:08 Traceback (most recent call last): > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py", > line 517, in test_read_from_text_with_file_name_file_pattern > 22:05:08 pipeline.run() > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 22:05:08 else test_runner_api)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 406, in run > 22:05:08 self._options).run(False) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 419, in run > 22:05:08 return self.runner.run_pipeline(self, self._options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 128, in run_pipeline > 22:05:08 return runner.run_pipeline(pipeline, options) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 294, in run_pipeline > 22:05:08 default_environment=self._default_environment)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 301, in run_via_runner_api > 22:05:08 return self.run_stages(stage_context, stages) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 383, in run_stages > 22:05:08 stage_context.safe_coders) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 655, in _run_stage > 22:05:08 result, splits = bundle_manager.process_bundle(data_input, > data_output) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1471, in process_bundle > 22:05:08 result_future = > self._controller.control_handler.push(process_bundle_req) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 990, in push > 22:05:08 response = self.worker.do_instruction(request) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 342, in do_instruction > 22:05:08 request.instruction_id) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 368, in process_bundle > 22:05:08 bundle_processor.process_bundle(instruction_id)) > 22:05:08 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py", > line 593, in process_bundle > 22:05:08 data.ptransform_id
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347809&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347809 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 23:57 Start Date: 21/Nov/19 23:57 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349377287 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self): label='after reshuffle') pipeline.run() + @attr('ValidatesRunner') + def test_reshuffle_preserves_timestamps(self): +pipeline = TestPipeline() + +# Create a PCollection and assign each element with a different timestamp. +before_reshuffle = (pipeline +| "Four elements" >> beam.Create([ +{'name': 'foo', 'timestamp': MIN_TIMESTAMP}, +{'name': 'foo', 'timestamp': 0}, +{'name': 'bar', 'timestamp': 33}, +{'name': 'bar', 'timestamp': MAX_TIMESTAMP}, +]) +| "With timestamp" >> beam.Map( +lambda element: beam.window.TimestampedValue( +element, element['timestamp']))) + +# For each element in a PCollection, gets the current timestamp of the +# element and reassigns the timestamp to the element. +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) + +# Reshuffle the PCollection above and assign the timestamp of an element to +# that element again. +after_reshuffle = (before_reshuffle + | "Reshuffle" >> beam.Reshuffle() + | "With timestamps again" >> beam.ParDo(AddTimestamp())) + +# Given an element, emits a string which contains the timestamp and the name +# field of the element. +class FormatWithTimestamp(beam.DoFn): Review comment: You can have a method ``` def format_with_timestamp(element, timestamp=beam.DoFn.TimestampParam): ... ``` rather than a full DoFn. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347809) Time Spent: 17h 40m (was: 17.5h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=347807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347807 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 21/Nov/19 23:55 Start Date: 21/Nov/19 23:55 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035#issuecomment-557325606 There hasn't been any review in the last 9 days, so I'm asking @pabloem to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347807) Time Spent: 4h 10m (was: 4h) > Python SDK labels ontime empty panes as late > > > Key: BEAM-8581 > URL: https://issues.apache.org/jira/browse/BEAM-8581 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The GeneralTriggerDriver does not put watermark holds on timers, leading to > the ontime empty pane being considered late data. > Fix: Add a new notion of whether a trigger has an ontime pane. If it does, > then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=347805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347805 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 21/Nov/19 23:54 Start Date: 21/Nov/19 23:54 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r349376331 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) Review comment: So what you'd want to do is create some class with custom pickling (implement `__reduce__`). This picking would be artificially large, e.g. include a dummy `'a' * 1000` value, as many runners trigger on serialized size not element count. In the constructor, you would increment a class-level variable to indicate how many are alive. In the destructor (`__del__`), you would decrement it. An error would be thrown if there are too many. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347805) Time Spent: 8h (was: 7h 50m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347806 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 23:54 Start Date: 21/Nov/19 23:54 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#issuecomment-557325303 Modified code according to the reviewer's second round of feedback, except that I still need DoFn to get timestamp. Waiting for reviewer to resolve conversations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347806) Time Spent: 17.5h (was: 17h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 17.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347804 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 23:54 Start Date: 21/Nov/19 23:54 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#discussion_r349376284 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -698,10 +700,10 @@ message StandardCoders { // TODO: consider inlining field on PCollection message WindowingStrategy { - // (Required) The SdkFunctionSpec of the UDF that assigns windows, + // (Required) The FunctionSpec of the UDF that assigns windows, // merges windows, and shifts timestamps before they are // combined according to the OutputTime. - SdkFunctionSpec window_fn = 1; + FunctionSpec window_fn = 1; Review comment: After hearing about some of the difficulties that Cham is running into. I would go either way on whether we add an environment id here or remove it completely. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347804) Time Spent: 1.5h (was: 1h 20m) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8804) PCollectionList support in cross-language transforms
[ https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee reassigned BEAM-8804: - Assignee: Heejong Lee > PCollectionList support in cross-language transforms > > > Key: BEAM-8804 > URL: https://issues.apache.org/jira/browse/BEAM-8804 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > Currently, Beam model doesn't have any information on the order of output > PCollections from PTransforms. So, PCollectionList needs to be converted to > PCollectionTuple when it goes across the cross-language boundary (or even in > the same language, when it is converted between in-memory object and proto). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8804) PCollectionList support in cross-language transforms
[ https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-8804: -- Status: Open (was: Triage Needed) > PCollectionList support in cross-language transforms > > > Key: BEAM-8804 > URL: https://issues.apache.org/jira/browse/BEAM-8804 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > Currently, Beam model doesn't have any information on the order of output > PCollections from PTransforms. So, PCollectionList needs to be converted to > PCollectionTuple when it goes across the cross-language boundary (or even in > the same language, when it is converted between in-memory object and proto). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347800 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 23:50 Start Date: 21/Nov/19 23:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10159: [BEAM-8575] Added a unit test to CombineTest class to test that Combi… URL: https://github.com/apache/beam/pull/10159#discussion_r349375222 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +398,18 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_hot_key_combining_with_accumulation_mode(self): +with TestPipeline() as p: + result = (p +| beam.Create([1, 2, 3, 4, 5]) +| beam.WindowInto(GlobalWindows(), + trigger=Repeatedly(AfterCount(1)), + accumulation_mode= + AccumulationMode.ACCUMULATING) +| beam.CombineGlobally(sum).without_defaults().with_fanout(2)) Review comment: Why specify without_defaults()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347800) Time Spent: 17h 10m (was: 17h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 17h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347801 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 23:50 Start Date: 21/Nov/19 23:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10159: [BEAM-8575] Added a unit test to CombineTest class to test that Combi… URL: https://github.com/apache/beam/pull/10159#discussion_r349375288 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +398,18 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_hot_key_combining_with_accumulation_mode(self): +with TestPipeline() as p: + result = (p +| beam.Create([1, 2, 3, 4, 5]) +| beam.WindowInto(GlobalWindows(), + trigger=Repeatedly(AfterCount(1)), Review comment: We would need to use a test stream to make this test non-trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347801) Time Spent: 17h 20m (was: 17h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 17h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347799 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 21/Nov/19 23:46 Start Date: 21/Nov/19 23:46 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#issuecomment-557323260 Just remove the synchronized from `BeamFnDataSizeBasedBufferingOutboundObserver.flush()` and add ``` @Override public synchronized void flush() throws IOException { super.flush(); } ``` to the time based outbound observer and we can merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347799) Time Spent: 3h 20m (was: 3h 10m) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347798 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 23:45 Start Date: 21/Nov/19 23:45 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#discussion_r349374247 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -698,10 +700,10 @@ message StandardCoders { // TODO: consider inlining field on PCollection message WindowingStrategy { - // (Required) The SdkFunctionSpec of the UDF that assigns windows, + // (Required) The FunctionSpec of the UDF that assigns windows, // merges windows, and shifts timestamps before they are // combined according to the OutputTime. - SdkFunctionSpec window_fn = 1; + FunctionSpec window_fn = 1; Review comment: The windowing strategy cannot be coerced into an environment-agnostic one the same way a coder can be, as it involves actually executing user code as oppose to specifying constraints on its behavior. (In some sense, it is in a very real sense a cross-language UDF.) Having to traverse the graph is ugly, but I suppose possible for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347798) Time Spent: 1h 20m (was: 1h 10m) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347797 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 21/Nov/19 23:45 Start Date: 21/Nov/19 23:45 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#issuecomment-557323260 Just remove the synchronized from BeamFnDataSizeBasedBufferingOutboundObserver.java and add ``` @Override public synchronized void flush() throws IOException { super.flush(); } ``` to the time based outbound observer and we can merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347797) Time Spent: 3h 10m (was: 3h) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347795&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347795 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 21/Nov/19 23:43 Start Date: 21/Nov/19 23:43 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#discussion_r349373751 ## File path: sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java ## @@ -48,25 +46,27 @@ Coder coder, StreamObserver outboundObserver) { super(sizeLimit, outputLocation, coder, outboundObserver); -this.lock = new Object(); +this.flushLock = new Object(); this.flushFuture = Executors.newSingleThreadScheduledExecutor( new ThreadFactoryBuilder() .setDaemon(true) .setNameFormat("DataBufferOutboundFlusher-thread") .build()) .scheduleAtFixedRate(this::periodicFlush, timeLimit, timeLimit, TimeUnit.MILLISECONDS); Review comment: Your right, I missed the fact that there was no periodic schedule that took a callable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347795) Time Spent: 3h (was: 2h 50m) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347796 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 21/Nov/19 23:43 Start Date: 21/Nov/19 23:43 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#discussion_r349372594 ## File path: sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataSizeBasedBufferingOutboundObserver.java ## @@ -85,7 +85,7 @@ public void close() throws Exception { } @Override - public void flush() throws IOException { + public synchronized void flush() throws IOException { Review comment: We don't want to make this synchronized since this class is not thread safe and should not take this perf hit. Your previous usage of defining an override that is synchronized for the class that called super.flush() is all you need. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347796) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347788 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 21/Nov/19 23:16 Start Date: 21/Nov/19 23:16 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10175: [BEAM-7390] Add code snippet for Max URL: https://github.com/apache/beam/pull/10175#discussion_r349364098 ## File path: sdks/python/apache_beam/examples/snippets/transforms/aggregation/max.py ## @@ -0,0 +1,60 @@ +# coding=utf-8 Review comment: Does each of these need their own file? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347788) Time Spent: 4h 10m (was: 4h) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347786 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 23:12 Start Date: 21/Nov/19 23:12 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349364637 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self): label='after reshuffle') pipeline.run() + @attr('ValidatesRunner') + def test_reshuffle_preserves_timestamps(self): +pipeline = TestPipeline() + +# Create a PCollection and assign each element with a different timestamp. +before_reshuffle = (pipeline +| "Four elements" >> beam.Create([ +{'name': 'foo', 'timestamp': MIN_TIMESTAMP}, +{'name': 'foo', 'timestamp': 0}, +{'name': 'bar', 'timestamp': 33}, +{'name': 'bar', 'timestamp': MAX_TIMESTAMP}, +]) +| "With timestamp" >> beam.Map( +lambda element: beam.window.TimestampedValue( +element, element['timestamp']))) + +# For each element in a PCollection, gets the current timestamp of the +# element and reassigns the timestamp to the element. +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) + +# Reshuffle the PCollection above and assign the timestamp of an element to +# that element again. +after_reshuffle = (before_reshuffle + | "Reshuffle" >> beam.Reshuffle() + | "With timestamps again" >> beam.ParDo(AddTimestamp())) + +# Given an element, emits a string which contains the timestamp and the name +# field of the element. +class FormatWithTimestamp(beam.DoFn): Review comment: I need beam.DoFn.TimestampParam to get the timestamp of each element. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347786) Time Spent: 17h (was: 16h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 17h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347785 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 21/Nov/19 23:04 Start Date: 21/Nov/19 23:04 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184#issuecomment-557312272 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347785) Time Spent: 3h (was: 2h 50m) > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > Time Spent: 3h > Remaining Estimate: 0h > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347782 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 22:55 Start Date: 21/Nov/19 22:55 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349359544 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self): label='after reshuffle') pipeline.run() + @attr('ValidatesRunner') + def test_reshuffle_preserves_timestamps(self): +pipeline = TestPipeline() + +# Create a PCollection and assign each element with a different timestamp. +before_reshuffle = (pipeline +| "Four elements" >> beam.Create([ +{'name': 'foo', 'timestamp': MIN_TIMESTAMP}, +{'name': 'foo', 'timestamp': 0}, +{'name': 'bar', 'timestamp': 33}, +{'name': 'bar', 'timestamp': MAX_TIMESTAMP}, +]) +| "With timestamp" >> beam.Map( +lambda element: beam.window.TimestampedValue( +element, element['timestamp']))) + +# For each element in a PCollection, gets the current timestamp of the +# element and reassigns the timestamp to the element. +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) + +# Reshuffle the PCollection above and assign the timestamp of an element to +# that element again. +after_reshuffle = (before_reshuffle + | "Reshuffle" >> beam.Reshuffle() + | "With timestamps again" >> beam.ParDo(AddTimestamp())) Review comment: I agree. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347782) Time Spent: 16h 50m (was: 16h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347780 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 22:54 Start Date: 21/Nov/19 22:54 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349359198 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -423,6 +425,70 @@ def test_reshuffle_streaming_global_window(self): label='after reshuffle') pipeline.run() + @attr('ValidatesRunner') + def test_reshuffle_preserves_timestamps(self): +pipeline = TestPipeline() + +# Create a PCollection and assign each element with a different timestamp. +before_reshuffle = (pipeline +| "Four elements" >> beam.Create([ +{'name': 'foo', 'timestamp': MIN_TIMESTAMP}, +{'name': 'foo', 'timestamp': 0}, +{'name': 'bar', 'timestamp': 33}, +{'name': 'bar', 'timestamp': MAX_TIMESTAMP}, +]) +| "With timestamp" >> beam.Map( +lambda element: beam.window.TimestampedValue( +element, element['timestamp']))) + +# For each element in a PCollection, gets the current timestamp of the +# element and reassigns the timestamp to the element. +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) + +# Reshuffle the PCollection above and assign the timestamp of an element to +# that element again. +after_reshuffle = (before_reshuffle + | "Reshuffle" >> beam.Reshuffle() Review comment: I agree. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347780) Time Spent: 16h 40m (was: 16.5h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347776 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 22:52 Start Date: 21/Nov/19 22:52 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349358685 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -276,6 +279,13 @@ def process(self, element): with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'): pipeline.run() +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) Review comment: I agree. Thank you for pointing it out! Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347776) Time Spent: 16.5h (was: 16h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 16.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347775 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 22:52 Start Date: 21/Nov/19 22:52 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10070: [BEAM-8575] Added a unit test for Reshuffle to test that Reshuffle pr… URL: https://github.com/apache/beam/pull/10070#discussion_r349358685 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -276,6 +279,13 @@ def process(self, element): with self.assertRaisesRegex(ValueError, r'window.*None.*add_timestamps2'): pipeline.run() +class AddTimestamp(beam.DoFn): + def process(self, element, timestamp=beam.DoFn.TimestampParam): +yield beam.window.TimestampedValue(element, timestamp) Review comment: I agree. Thank you for pointing it out! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347775) Time Spent: 16h 20m (was: 16h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=347773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347773 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 21/Nov/19 22:50 Start Date: 21/Nov/19 22:50 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10192: [BEAM-3865] Stronger trigger tests. URL: https://github.com/apache/beam/pull/10192#issuecomment-557307971 R: @HuangLED This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347773) Time Spent: 2h 10m (was: 2h) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.
[ https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347771 ] ASF GitHub Bot logged work on BEAM-8802: Author: ASF GitHub Bot Created on: 21/Nov/19 22:50 Start Date: 21/Nov/19 22:50 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10191: [BEAM-8802] Don't clear watermark hold when adding elements. URL: https://github.com/apache/beam/pull/10191#issuecomment-557307841 Thanks. I'll wait for all tests to pass before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347771) Time Spent: 50m (was: 40m) > Timestamp combiner not respected across bundles in streaming mode. > -- > > Key: BEAM-8802 > URL: https://issues.apache.org/jira/browse/BEAM-8802 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=347770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347770 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 21/Nov/19 22:49 Start Date: 21/Nov/19 22:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10192: [BEAM-3865] Stronger trigger tests. URL: https://github.com/apache/beam/pull/10192 When possible, runs trigger fns over various permutations and bundling of the inputs (using different keys), ensuring the results are still correct. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/
[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.
[ https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347764 ] ASF GitHub Bot logged work on BEAM-8802: Author: ASF GitHub Bot Created on: 21/Nov/19 22:43 Start Date: 21/Nov/19 22:43 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10191: [BEAM-8802] Don't clear watermark hold when adding elements. URL: https://github.com/apache/beam/pull/10191#discussion_r349355630 ## File path: sdks/python/apache_beam/transforms/timeutil.py ## @@ -64,19 +64,19 @@ class TimestampCombinerImpl(with_metaclass(ABCMeta, object)): @abstractmethod def assign_output_time(self, window, input_timestamp): -pass +raise NotImplementedError @abstractmethod def combine(self, output_timestamp, other_output_timestamp): -pass +raise NotImplementedError def combine_all(self, merging_timestamps): """Apply combine to list of timestamps.""" combined_output_time = None for output_time in merging_timestamps: if combined_output_time is None: combined_output_time = output_time - else: + elif output_time is not None: combined_output_time = self.combine( combined_output_time, output_time) Review comment: what would be the "else" after this change, and what would it imply when this happens? Shall we at least log something when fall into that situation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347764) Time Spent: 40m (was: 0.5h) > Timestamp combiner not respected across bundles in streaming mode. > -- > > Key: BEAM-8802 > URL: https://issues.apache.org/jira/browse/BEAM-8802 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.
[ https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347763 ] ASF GitHub Bot logged work on BEAM-8802: Author: ASF GitHub Bot Created on: 21/Nov/19 22:43 Start Date: 21/Nov/19 22:43 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10191: [BEAM-8802] Don't clear watermark hold when adding elements. URL: https://github.com/apache/beam/pull/10191#discussion_r349355630 ## File path: sdks/python/apache_beam/transforms/timeutil.py ## @@ -64,19 +64,19 @@ class TimestampCombinerImpl(with_metaclass(ABCMeta, object)): @abstractmethod def assign_output_time(self, window, input_timestamp): -pass +raise NotImplementedError @abstractmethod def combine(self, output_timestamp, other_output_timestamp): -pass +raise NotImplementedError def combine_all(self, merging_timestamps): """Apply combine to list of timestamps.""" combined_output_time = None for output_time in merging_timestamps: if combined_output_time is None: combined_output_time = output_time - else: + elif output_time is not None: combined_output_time = self.combine( combined_output_time, output_time) Review comment: what would be the "else" after this change, What would it imply when this happens? Shall we at least log something when fall into that situation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347763) Time Spent: 0.5h (was: 20m) > Timestamp combiner not respected across bundles in streaming mode. > -- > > Key: BEAM-8802 > URL: https://issues.apache.org/jira/browse/BEAM-8802 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=347761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347761 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 21/Nov/19 22:42 Start Date: 21/Nov/19 22:42 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10081: [BEAM-8645] A test case for TimestampCombiner. URL: https://github.com/apache/beam/pull/10081#discussion_r349355406 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -480,6 +486,74 @@ def test_with_input_types_decorator_violation(self): pc = p | Create(l_3_tuple) _ = pc | beam.CombineGlobally(self.fn) +# +# Test cases for streaming. +# +@attr('ValidatesRunner') +class TimestampCombinerTest(unittest.TestCase): + + @unittest.skip('BEAM-8657') + def test_combiner_earliest(self): +"""Test TimestampCombiner with EARLIEST.""" +options = PipelineOptions(streaming=True) +with TestPipeline(options=options) as p: + result = (p +| TestStream() +.add_elements([window.TimestampedValue(('k', 100), 2)]) +.add_elements([window.TimestampedValue(('k', 400), 7)]) +.advance_watermark_to_infinity() +| beam.WindowInto( +window.FixedWindows(10), +timestamp_combiner=TimestampCombiner.OUTPUT_AT_EARLIEST) +| beam.CombinePerKey(sum)) + + records = (result + | beam.Map(lambda e, ts=beam.DoFn.TimestampParam: (e, ts))) + + # All the KV pairs are applied GBK using EARLIEST timestamp for the same + # key. + expected_window_to_elements = { + window.IntervalWindow(0, 10): [ + (('k', 500), Timestamp(2)), + ], + } + + assert_that( + records, + equal_to_per_window(expected_window_to_elements), + use_global_window=False, + label='assert per window') + + def test_combiner_latest(self): +"""Test TimestampCombiner with LATEST.""" +options = PipelineOptions(streaming=True) +with TestPipeline(options=options) as p: + result = (p +| TestStream() +.add_elements([window.TimestampedValue(('k', 100), 2)]) +.add_elements([window.TimestampedValue(('k', 400), 7)]) Review comment: Turns out this happens to work because 7 is the last element. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347761) Time Spent: 7h 50m (was: 7h 40m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.
[ https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347757 ] ASF GitHub Bot logged work on BEAM-8802: Author: ASF GitHub Bot Created on: 21/Nov/19 22:40 Start Date: 21/Nov/19 22:40 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10191: [BEAM-8802] Don't clear watermark hold when adding elements. URL: https://github.com/apache/beam/pull/10191 This fixes the errors exposed at https://github.com/apache/beam/pull/10081 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://buil
[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.
[ https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=347758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347758 ] ASF GitHub Bot logged work on BEAM-8802: Author: ASF GitHub Bot Created on: 21/Nov/19 22:40 Start Date: 21/Nov/19 22:40 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10191: [BEAM-8802] Don't clear watermark hold when adding elements. URL: https://github.com/apache/beam/pull/10191#issuecomment-557305006 R: @HuangLED This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347758) Time Spent: 20m (was: 10m) > Timestamp combiner not respected across bundles in streaming mode. > -- > > Key: BEAM-8802 > URL: https://issues.apache.org/jira/browse/BEAM-8802 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347755 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 22:39 Start Date: 21/Nov/19 22:39 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10159: [BEAM-8575] Added a unit test to CombineTest class to test that Combi… URL: https://github.com/apache/beam/pull/10159#discussion_r349354363 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +398,18 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') Review comment: Why is this validates runner? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347755) Time Spent: 16h 10m (was: 16h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 16h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347752 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 22:35 Start Date: 21/Nov/19 22:35 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10190: [BEAM-8575] Added two unit tests to CombineTest class to test that Co… URL: https://github.com/apache/beam/pull/10190#issuecomment-557303427 The mean combine fn already covers this test case completely. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347752) Time Spent: 16h (was: 15h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8804) PCollectionList support in cross-language transforms
[ https://issues.apache.org/jira/browse/BEAM-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-8804: -- Description: Currently, Beam model doesn't have any information on the order of output PCollections from PTransforms. So, PCollectionList needs to be converted to PCollectionTuple when it goes across the cross-language boundary (or even in the same language, when it is converted between in-memory object and proto). (was: Currently, Beam model doesn't have any information on the order of output PCollections from PTransforms. So, PCollectionList needs to be converted to PCollectionTuple when it goes across the cross-language boundary (or even in the same language, when it is converted between in-memory object and proto). Maybe we should add more fields in PTransform proto definition to keep the ordering information.) > PCollectionList support in cross-language transforms > > > Key: BEAM-8804 > URL: https://issues.apache.org/jira/browse/BEAM-8804 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Heejong Lee >Priority: Major > > Currently, Beam model doesn't have any information on the order of output > PCollections from PTransforms. So, PCollectionList needs to be converted to > PCollectionTuple when it goes across the cross-language boundary (or even in > the same language, when it is converted between in-memory object and proto). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8739) Consistently use with Pipeline(...) syntax
[ https://issues.apache.org/jira/browse/BEAM-8739?focusedWorklogId=347751&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347751 ] ASF GitHub Bot logged work on BEAM-8739: Author: ASF GitHub Bot Created on: 21/Nov/19 22:33 Start Date: 21/Nov/19 22:33 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10149: [BEAM-8739] Consistently use with Pipeline(...) syntax URL: https://github.com/apache/beam/pull/10149#issuecomment-557302872 R: @ibzib lint and tests are now all happy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347751) Time Spent: 20m (was: 10m) > Consistently use with Pipeline(...) syntax > -- > > Key: BEAM-8739 > URL: https://issues.apache.org/jira/browse/BEAM-8739 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I've run into a couple of tests that forgot to do p.run(). In addition, I'm > seeing new tests written in this old style. We should consistently use the > with syntax where possible for our examples and tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347746 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 21/Nov/19 22:25 Start Date: 21/Nov/19 22:25 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184#issuecomment-557300482 Thank you for pointing 404 URL error. Fixing pom.xml. Documentation: https://github.com/GoogleCloudPlatform/cloud-opensource-java/tree/master/dependencies This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347746) Time Spent: 2h 50m (was: 2h 40m) > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > Time Spent: 2h 50m > Remaining Estimate: 0h > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347741 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 21/Nov/19 22:19 Start Date: 21/Nov/19 22:19 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184#discussion_r349347092 ## File path: build.gradle ## @@ -294,3 +294,41 @@ release { pushToRemote = '' } } + +// Reports linkage errors across multiple Apache Beam artifact ids. +// +// To use (from the root of project): +//./gradlew -Ppublishing -PjavaLinkageArtifactIds=artifactId1,artifactId2,... :checkJavaLinkage +// +// For example: +//./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-sdks-java-core,beam-sdks-java-io-jdbc :checkJavaLinkage +// +// Note that this task publishes artifacts into your local Maven repository. +if (project.hasProperty('javaLinkageArtifactIds')) { Review comment: Ah, I misread how you were using this property. But it would seem nice to base it on the current project's runtime scope. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347741) Time Spent: 2h 40m (was: 2.5h) > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > Time Spent: 2h 40m > Remaining Estimate: 0h > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347738 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 21/Nov/19 22:12 Start Date: 21/Nov/19 22:12 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184#issuecomment-557296259 Can you also add some links to documentation for the tool? I was just looking around for it, following the pointer from the maven central coordinates, which points to a real package but the web page registered for that package is a 404. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347738) Time Spent: 2.5h (was: 2h 20m) > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > Time Spent: 2.5h > Remaining Estimate: 0h > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8804) PCollectionList support in cross-language transforms
Heejong Lee created BEAM-8804: - Summary: PCollectionList support in cross-language transforms Key: BEAM-8804 URL: https://issues.apache.org/jira/browse/BEAM-8804 Project: Beam Issue Type: Improvement Components: beam-model Reporter: Heejong Lee Currently, Beam model doesn't have any information on the order of output PCollections from PTransforms. So, PCollectionList needs to be converted to PCollectionTuple when it goes across the cross-language boundary (or even in the same language, when it is converted between in-memory object and proto). Maybe we should add more fields in PTransform proto definition to keep the ordering information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver reassigned BEAM-8512: - Assignee: Kyle Weaver (was: Robert Bradshaw) > Add integration tests for Python "flink_runner.py" > -- > > Key: BEAM-8512 > URL: https://issues.apache.org/jira/browse/BEAM-8512 > Project: Beam > Issue Type: Test > Components: runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are currently no integration tests for the Python FlinkRunner. We need > a set of tests similar to {{flink_runner_test.py}} which currently use the > PortableRunner and not the FlinkRunner. > CC [~robertwb] [~ibzib] [~thw] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=347728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347728 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 21/Nov/19 21:53 Start Date: 21/Nov/19 21:53 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#issuecomment-557289249 Thanks Ning! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347728) Time Spent: 7h 10m (was: 7h) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)