[beam] branch master updated: [BEAM-12388] Add caching to deferred dataframes
This is an automated email from the ASF dual-hosted git repository. ningk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 7f63376 [BEAM-12388] Add caching to deferred dataframes new aa9eae5 Merge pull request #15180 from rohdesamuel/int-df-cache-2 7f63376 is described below commit 7f6337641a4c21336dab86d3a9a1ce172d02627d Author: Sam R AuthorDate: Mon May 10 17:46:03 2021 -0700 [BEAM-12388] Add caching to deferred dataframes This adds caching to any dataframes using the InteractiveRuner. --- .../runners/interactive/caching/__init__.py| 1 - .../interactive/caching/expression_cache.py| 109 ++ .../interactive/caching/expression_cache_test.py | 128 + .../runners/interactive/interactive_beam.py| 13 +-- .../runners/interactive/interactive_runner_test.py | 75 .../runners/interactive/recording_manager.py | 3 +- .../apache_beam/runners/interactive/utils.py | 17 +++ 7 files changed, 333 insertions(+), 13 deletions(-) diff --git a/sdks/python/apache_beam/runners/interactive/caching/__init__.py b/sdks/python/apache_beam/runners/interactive/caching/__init__.py index 97b1be9..cce3aca 100644 --- a/sdks/python/apache_beam/runners/interactive/caching/__init__.py +++ b/sdks/python/apache_beam/runners/interactive/caching/__init__.py @@ -14,4 +14,3 @@ # See the License for the specific language governing permissions and # limitations under the License. # -from apache_beam.runners.interactive.caching.streaming_cache import StreamingCache diff --git a/sdks/python/apache_beam/runners/interactive/caching/expression_cache.py b/sdks/python/apache_beam/runners/interactive/caching/expression_cache.py new file mode 100644 index 000..5b1b9ef --- /dev/null +++ b/sdks/python/apache_beam/runners/interactive/caching/expression_cache.py @@ -0,0 +1,109 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import * + +import apache_beam as beam +from apache_beam.dataframe import convert +from apache_beam.dataframe import expressions + + +class ExpressionCache(object): + """Utility class for caching deferred DataFrames expressions. + + This is cache is currently a light-weight wrapper around the + TO_PCOLLECTION_CACHE in the beam.dataframes.convert module and the + computed_pcollections in the interactive module. + + Example:: + +df : beam.dataframe.DeferredDataFrame = ... +... +cache = ExpressionCache() +cache.replace_with_cached(df._expr) + + This will automatically link the instance to the existing caches. After it is + created, the cache can then be used to modify an existing deferred dataframe + expression tree to replace nodes with computed PCollections. + + This object can be created and destroyed whenever. This class holds no state + and the only side-effect is modifying the given expression. + """ + def __init__(self, pcollection_cache=None, computed_cache=None): +from apache_beam.runners.interactive import interactive_environment as ie + +self._pcollection_cache = ( +convert.TO_PCOLLECTION_CACHE +if pcollection_cache is None else pcollection_cache) +self._computed_cache = ( +ie.current_env().computed_pcollections +if computed_cache is None else computed_cache) + + def replace_with_cached( + self, expr: expressions.Expression) -> Dict[str, expressions.Expression]: +"""Replaces any previously computed expressions with PlaceholderExpressions. + +This is used to correctly read any expressions that were cached in previous +runs. This enables the InteractiveRunner to prune off old calculations from +the expression tree. +""" + +replaced_inputs: Dict[str, expressions.Expression] = {} +self._replace_with_cached_recur(expr, replaced_inputs) +return replaced_inputs + + def _replace_with_cached_recur( + self, + expr: expressions.Expression, + replaced_inputs: Dict[str, expressions.Expression]) -> None: +"""R
[beam] branch master updated: Updated screendiff golden screenshots for Linux platforms.
This is an automated email from the ASF dual-hosted git repository. ningk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 97cf94c Updated screendiff golden screenshots for Linux platforms. new 3e933b5 Merge pull request #15171 from KevinGG/fix_dep 97cf94c is described below commit 97cf94cf65a9fb0566c771d0d667bc1b4e028f7c Author: KevinGG AuthorDate: Wed Jul 14 15:11:38 2021 -0700 Updated screendiff golden screenshots for Linux platforms. --- .../Linux/29c9237ddf4f3d5988a503069b4d3c47.png | Bin 64591 -> 65511 bytes .../Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png | Bin 679724 -> 728121 bytes 2 files changed, 0 insertions(+), 0 deletions(-) diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png index 96ed442..44cfc70 100644 Binary files a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png index e2e1ad4..aa98c62 100644 Binary files a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png differ
[beam] branch master updated: Misc Fixes
This is an automated email from the ASF dual-hosted git repository. ningk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 76ee783 Misc Fixes new 3f3a7fb Merge pull request #15163 from KevinGG/fix_dep 76ee783 is described below commit 76ee7830602aac64f00bc12db4c0be818b7d699a Author: KevinGG AuthorDate: Mon Jul 12 17:23:59 2021 -0700 Misc Fixes 1. Moved google-api-core from 1.22.0 to 1.22.2 since the newly added google-cloud-recommendations-ai's upperbound 0.2.0 has requirement google-api-core[grpc]<2.0.0dev,>=1.22.2. 2. Renamed screen_diff_tests.py to screen_diff_test.py so that pytest can recognize the tests. Updated the screenshots for Darwin. --- .../Darwin/29c9237ddf4f3d5988a503069b4d3c47.png| Bin 0 -> 67298 bytes .../Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png| Bin 748019 -> 760584 bytes .../{screen_diff_tests.py => screen_diff_test.py} | 14 +- sdks/python/container/base_image_requirements.txt | 2 +- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/29c9237ddf4f3d5988a503069b4d3c47.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/29c9237ddf4f3d5988a503069b4d3c47.png new file mode 100644 index 000..8463b3f Binary files /dev/null and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/29c9237ddf4f3d5988a503069b4d3c47.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png index 08de5a4..2179619 100644 Binary files a/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png and b/sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Darwin/7a35f487b2a5f3a9b9852a8659eeb4bd.png differ diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_tests.py b/sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_test.py similarity index 81% rename from sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_tests.py rename to sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_test.py index bcff805..0d36c88 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_tests.py +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/tests/screen_diff_test.py @@ -23,9 +23,6 @@ import unittest import pytest from apache_beam.runners.interactive.testing.integration.screen_diff import BaseTestCase -from selenium.webdriver.common.by import By -from selenium.webdriver.support import expected_conditions -from selenium.webdriver.support.ui import WebDriverWait @pytest.mark.timeout(300) @@ -35,8 +32,15 @@ class DataFramesTest(BaseTestCase): super(DataFramesTest, self).__init__(*args, **kwargs) def explicit_wait(self): -WebDriverWait(self.driver, 5).until( -expected_conditions.presence_of_element_located((By.ID, 'test-done'))) +try: + from selenium.webdriver.common.by import By + from selenium.webdriver.support import expected_conditions + from selenium.webdriver.support.ui import WebDriverWait + + WebDriverWait(self.driver, 5).until( + expected_conditions.presence_of_element_located((By.ID, 'test-done'))) +except: + pass # The test will be ignored. def test_dataframes(self): self.assert_notebook('dataframes') diff --git a/sdks/python/container/base_image_requirements.txt b/sdks/python/container/base_image_requirements.txt index 04efbf5..2926ff1 100644 --- a/sdks/python/container/base_image_requirements.txt +++ b/sdks/python/container/base_image_requirements.txt @@ -43,7 +43,7 @@ typing-extensions==3.7.4.3 # GCP extra features google-auth==1.31.0 -google-api-core==1.22.0 +google-api-core==1.22.2 google-apitools==0.5.31 google-cloud-pubsub==1.0.2 google-cloud-bigquery==1.26.1
[beam] branch master updated (fc619bd -> 2944e48)
This is an automated email from the ASF dual-hosted git repository. ningk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from fc619bd Merge pull request #15158 from lazylynx/fix/unintended_docstring_removal new c440ed6 [BEAM-12531] Compat changes for deferred dataframes with ib.show new 79e4db9 address linter new f82faed address linter new e616db2 add license to dataframes.ipynb new 2944e48 Merge pull request #15097 from rohdesamuel/int-df-show The 32475 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../interactive/display/pcoll_visualization.py | 20 +-- .../runners/interactive/interactive_beam.py| 42 --- .../runners/interactive/interactive_beam_test.py | 22 .../runners/interactive/recording_manager.py | 7 +++ .../Linux/29c9237ddf4f3d5988a503069b4d3c47.png | Bin 0 -> 64591 bytes .../Linux/7a35f487b2a5f3a9b9852a8659eeb4bd.png | Bin 728121 -> 679724 bytes .../integration/test_notebooks/dataframes.ipynb} | 58 +++-- ...it_square_cube_test.py => screen_diff_tests.py} | 17 ++ 8 files changed, 115 insertions(+), 51 deletions(-) create mode 100644 sdks/python/apache_beam/runners/interactive/testing/integration/goldens/Linux/29c9237ddf4f3d5988a503069b4d3c47.png copy sdks/python/apache_beam/runners/interactive/{examples/Interactive Beam Running on Flink.ipynb => testing/integration/test_notebooks/dataframes.ipynb} (58%) rename sdks/python/apache_beam/runners/interactive/testing/integration/tests/{init_square_cube_test.py => screen_diff_tests.py} (69%)
[beam] branch master updated (e6e9b83 -> 4e0fbad)
This is an automated email from the ASF dual-hosted git repository. ningk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from e6e9b83 Prefetch subsequent pages over the FnAPI. (#14803) new 8255137 [BEAM-10708] Save and load coders correctly new e95b18a Fixed lint new 19aaad2 Fixed the cache_manager write implementation. new e7bb8b9 Fixed lint new 6781ce8 Fixed lint new 4e0fbad Merge pull request #15023 from KevinGG/sql The 32309 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../runners/interactive/background_caching_job.py | 6 ++- .../runners/interactive/cache_manager.py | 33 ++- .../runners/interactive/cache_manager_test.py | 37 ++--- .../runners/interactive/caching/streaming_cache.py | 47 +++--- .../interactive/caching/streaming_cache_test.py| 19 + .../interactive/pipeline_instrument_test.py| 6 +-- sdks/python/setup.py | 2 +- 7 files changed, 97 insertions(+), 53 deletions(-)
[beam] branch master updated: Use functools.wraps in @progress_indicator
This is an automated email from the ASF dual-hosted git repository. ningk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 0cafed1 Use functools.wraps in @progress_indicator new fd9a6bd Merge pull request #14957 from TheNeuralBit/interactive-beam-docs 0cafed1 is described below commit 0cafed17e295d6c6fd2ba77d0c31348b3938fd22 Author: Brian Hulette AuthorDate: Mon Jun 7 11:43:57 2021 -0700 Use functools.wraps in @progress_indicator --- sdks/python/apache_beam/runners/interactive/utils.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sdks/python/apache_beam/runners/interactive/utils.py b/sdks/python/apache_beam/runners/interactive/utils.py index 878ec23..bbe88ff 100644 --- a/sdks/python/apache_beam/runners/interactive/utils.py +++ b/sdks/python/apache_beam/runners/interactive/utils.py @@ -18,6 +18,7 @@ """Utilities to be used in Interactive Beam. """ +import functools import hashlib import json import logging @@ -240,6 +241,7 @@ def progress_indicated(func): """A decorator using a unique progress indicator as a context manager to execute the given function within.""" + @functools.wraps(func) def run_within_progress_indicator(*args, **kwargs): with ProgressIndicator('Processing...', 'Done.'): return func(*args, **kwargs)
[beam] branch master updated (985e2f0 -> 87effb2)
This is an automated email from the ASF dual-hosted git repository. ningk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 985e2f0 Merge pull request #14543: [BEAM-12172] Bump gradle to 6.8.3 new 9aec93f [BEAM-12178] Fix flakiness new b03e1dd Added an onerror warning users about temp files not deleted by recording manager. new 87effb2 Merge pull request #14558 from KevinGG/BEAM-12178 The 31642 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../runners/interactive/caching/streaming_cache.py | 11 ++- .../runners/interactive/recording_manager_test.py | 13 + 2 files changed, 19 insertions(+), 5 deletions(-)
[beam] branch master updated (bb948c1 -> 06f6050)
This is an automated email from the ASF dual-hosted git repository. ningk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from bb948c1 [BEAM-12029] Make WontImplementErrors more helpful (#14517) new fd22c67 Don't use fake coders in interactive Beam. new 80d2364 Formatting fixes new 206ca1f Formatting fixes new 6e24d1d Ran yapf on changes. new 06f6050 Merge pull request #14561 from dmkozh/fake_coders The 31630 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../runners/interactive/background_caching_job.py | 5 ++-- .../runners/interactive/pipeline_fragment.py | 7 +++--- .../runners/interactive/pipeline_fragment_test.py | 4 ++-- .../runners/interactive/pipeline_instrument.py | 18 +++--- .../interactive/pipeline_instrument_test.py| 28 +- 5 files changed, 26 insertions(+), 36 deletions(-)
[beam] branch master updated (33cd75f -> e4e39e4)
This is an automated email from the ASF dual-hosted git repository. ningk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 33cd75f Merge pull request #14465 from KevinGG/BEAM-11045 new 5ddae82 [BEAM-10708] Read/Write Intermediate PCollections new e45705d Fix lint new 0e9d4f9 Fix based on comments new 6d26c75 Added clear method to InMemoryCache because tests might be flaky when a clear is issued in recording manager tests. new 719318b Avoid using interactive_environment module in the test because ie.current_env() is raced by threads/processes in the test environment on Jenkins. new dd87e14 Added back the setUp as additional cleanup routine before each test. new e4e39e4 Merge pull request #14368 from KevinGG/portable_pin_2 The 31431 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../runners/interactive/augmented_pipeline.py | 132 +++ .../runners/interactive/augmented_pipeline_test.py | 86 ++ .../runners/interactive/caching/read_cache.py | 148 .../runners/interactive/caching/read_cache_test.py | 91 ++ .../runners/interactive/caching/write_cache.py | 188 + .../interactive/caching/write_cache_test.py| 85 ++ .../interactive/testing/test_cache_manager.py | 4 + 7 files changed, 734 insertions(+) create mode 100644 sdks/python/apache_beam/runners/interactive/augmented_pipeline.py create mode 100644 sdks/python/apache_beam/runners/interactive/augmented_pipeline_test.py create mode 100644 sdks/python/apache_beam/runners/interactive/caching/read_cache.py create mode 100644 sdks/python/apache_beam/runners/interactive/caching/read_cache_test.py create mode 100644 sdks/python/apache_beam/runners/interactive/caching/write_cache.py create mode 100644 sdks/python/apache_beam/runners/interactive/caching/write_cache_test.py
[beam] branch master updated: [BEAM-11045] Avoid broken deps
This is an automated email from the ASF dual-hosted git repository. ningk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 31efffe [BEAM-11045] Avoid broken deps new 33cd75f Merge pull request #14465 from KevinGG/BEAM-11045 31efffe is described below commit 31efffee2186aa65a3f2ece95b6813fea29ef003 Author: KevinGG AuthorDate: Wed Apr 7 15:00:56 2021 -0700 [BEAM-11045] Avoid broken deps 1. A new release: jupyter-client 6.1.13 breaks notebooks when getting messages. 2. This is caught by screen diff integration tests of interactive beam. 3. The issue is documented at https://github.com/jupyter/jupyter_client/issues/637 ETA of the fix might be 6.2.x of jupyter-client. 4. Added a timeout=-1 to allow indefinite wait for execution of each notebook cell in integration tests. The default value was 1 second. Instead of timeout each cell, moved the timeout to each test using pytest mark. --- .../runners/interactive/testing/integration/notebook_executor.py | 3 ++- .../interactive/testing/integration/tests/init_square_cube_test.py| 3 +++ sdks/python/setup.py | 4 +++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py b/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py index 1051722..8478735 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/notebook_executor.py @@ -82,7 +82,8 @@ class NotebookExecutor(object): for path in self._paths: with open(path, 'r') as nb_f: nb = nbformat.read(nb_f, as_version=4) -ep = ExecutePreprocessor(allow_errors=True, kernel_name='test') +ep = ExecutePreprocessor( +timeout=-1, allow_errors=True, kernel_name='test') ep.preprocess(nb, {'metadata': {'path': os.path.dirname(path)}}) execution_id = obfuscate(path) diff --git a/sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py b/sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py index 95d94f4..5cd9184 100644 --- a/sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py +++ b/sdks/python/apache_beam/runners/interactive/testing/integration/tests/init_square_cube_test.py @@ -22,9 +22,12 @@ from __future__ import absolute_import import unittest +import pytest + from apache_beam.runners.interactive.testing.integration.screen_diff import BaseTestCase +@pytest.mark.timeout(300) class InitSquareCubeTest(BaseTestCase): def __init__(self, *args, **kwargs): kwargs['golden_size'] = (1024, 1) diff --git a/sdks/python/setup.py b/sdks/python/setup.py index b68c690..775f321 100644 --- a/sdks/python/setup.py +++ b/sdks/python/setup.py @@ -206,7 +206,9 @@ INTERACTIVE_BEAM = [ 'facets-overview>=1.0.0,<2', 'ipython>=5.8.0,<8', 'ipykernel>=5.2.0,<6', -'jupyter-client>=6.1.11,<7', +# Skip version 6.1.13 due to +# https://github.com/jupyter/jupyter_client/issues/637 +'jupyter-client>=6.1.11,<6.1.13', 'timeloop>=1.0.2,<2', ]
[beam] branch master updated (1641aae -> 0805b9b)
This is an automated email from the ASF dual-hosted git repository. ningk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 1641aae [BEAM-11585] Select.flattenedSchema doesn't flatten nested array fields (#14354) new 110f568 [BEAM-12096] Attempt to fix flaky test new e3f5eb8 Added logging of potential ImportError new 5b27bc8 Use PropertyMock to replace the global singleton current_env() new 79578bc Changed warning logs about not in REPL env to error level and fixed a typo. new 0805b9b Merge pull request #14437 from KevinGG/BEAM-12096 The 31392 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../runners/interactive/interactive_environment.py | 2 +- .../apache_beam/runners/interactive/utils.py | 14 +-- .../apache_beam/runners/interactive/utils_test.py | 49 -- 3 files changed, 39 insertions(+), 26 deletions(-)