[GitHub] beam pull request #2519: [BEAM-1925] Updates DoFn invocation logic to be mor...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/2519 [BEAM-1925] Updates DoFn invocation logic to be more extensible. Adds following abstractions. DoFnSignature: describes the signature of a given DoFn object. DoFnInvoker: defines a particular way for invoking DoFn methods. I believe existing tests cover the updated code paths. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam sdf_direct_runner2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2519 commit ea542113b8936cba2295e61471218a5c01be9a58 Author: chamik...@google.com Date: 2017-04-07T20:41:28Z Updates DoFn invocation logic to be more extensible. Adds following abstractions. DoFnSignature: describes the signature of a given DoFn object. DoFnInvoker: defines a particular way for invoking DoFn methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #2536: [BEAM-1179] Renames assertions of source_test_utils
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/2536 [BEAM-1179] Renames assertions of source_test_utils Renames assertions of source_test_utils from camelcase to underscore-separated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam rename_sourcetestutil_asserts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2536.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2536 commit 82ba164b6f0ca69abbc707163232fa5b5791dc9a Author: chamik...@google.com Date: 2017-04-14T01:57:04Z Update assertions of source_test_utils from camelcase to underscore-separated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3701: Updates BEAM_CONTAINER_VERSION to 2.2.0.
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3701 Updates BEAM_CONTAINER_VERSION to 2.2.0. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_container_version_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3701.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3701 commit cc7b4da2f88c0e5fdfc27c0588d0cc66a489a928 Author: chamik...@google.com Date: 2017-08-08T06:47:57Z Updates BEAM_CONTAINER_VERSION to 2.2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3715: [BEAM-2711] Updates ByteKeyRangeTracker so that get...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3715 [BEAM-2711] Updates ByteKeyRangeTracker so that getFractionConsumed() does not fail for completed trackers After this update: * getFractionConsumed() returns 1.0 after markDone() is set. * getFractionConsumed() returns 1.0 after tryReturnRecordAt() is invoked for a position that is larger than or equal to the end key. This is similar to how getFractionConsumed() method of OffsetRangeTracker is implemented. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam key_range_progress Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3715.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3715 commit ba08ec3bfb1eead06772945ab888d910ffe7d436 Author: chamik...@google.com Date: 2017-08-11T00:35:37Z Updates ByteKeyRangeTracker so that getFractionConsumed() does not fail for completed trackers. After this update: * getFractionConsumed() returns 1.0 after markDone() is set. * getFractionConsumed() returns 1.0 after tryReturnRecordAt() is invoked for a position that is larger than or equal to the end key. This is similar to how getFractionConsumed() method of OffsetRangeTracker is implemented. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3731: Fixes a pydocs validation failure due to a recent c...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3731 Fixes a pydocs validation failure due to a recent commit. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam datastore_docs_failure Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3731.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3731 commit 8dc6e1666f3f113fe5ee854f4c7060e0fbd614e1 Author: chamik...@google.com Date: 2017-08-18T01:21:44Z Fixes a pydocs validation failure due to a recent commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3820: [BEAM-2545] Updates bigtable.version to 1.0.0-pre3.
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3820 [BEAM-2545] Updates bigtable.version to 1.0.0-pre3. Performs a slight update to BigtableServiceImpl to comply with the new version. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_bigtable_dependency Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3820.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3820 commit 27d95db2a22738d16177157b69f87deff58477db Author: chamik...@google.com Date: 2017-09-08T07:11:11Z Updates bigtable.version to 1.0.0-pre3. Performs a slight update to BigtableServiceImpl to comply with the new version. ---
[GitHub] beam pull request #2770: [BEAM-539] Fixes several issues of FileSink
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/2770 [BEAM-539] Fixes several issues of FileSink (1) Updates FileSink to fail for file name prefixes that only contain a single component (for example GCS buckets). For example, currently FileSink fails for gs://aaa while passing for gs://aaa/. This change makes FileSink fail for both cases (and makes the behavior consistent with Java). (2) Updates the name of the temporary directory created by FileSink Currently , for a filename prefix 'gs://aaa/bbb', the temp path would be of the form gs://aaa/bbb-temp-... . This is error prone since a user pattern 'gs://aaa/bbb*' would match temp files. This changes makes the temp path format 'gs://aaa/beam-temp-bbb-...' instead. To achieve above this PR adds a method 'split()' to FileSystem interface that is analogous to Python 'os.path.split()' (and which has the opposite effect of current method FileSystem.join()) You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam gcs_root_location_file_sink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2770.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2770 commit b66ae881a5adcdc5be8ee67a6d4ad842a2ea0147 Author: Chamikara Jayalath Date: 2017-04-28T21:38:35Z Fixes several issues of FileSink. (1) Updates FileSink to fail for file name prefixes that only contain a single component (for example GCS buckets). For example, currently FileSink fails for gs://aaa while passing for gs://aaa/. This change makes FileSink fail for both cases (and makes the behaviour consistent with Java). (2) Updates the name of the temporary directory created by FileSink Currently , for a filename prefix 'gs://aaa/bbb', the temp path would be of the form gs://aaa/bbb-temp-... . This is error prone since a user pattern 'gs://aaa/bbb*' would match temp files. This changes makes the temp path format 'gs://aaa/beam-temp-bbb-...' instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3661: [BEAM-2643] Adds two new Read PTransforms that can ...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3661 [BEAM-2643] Adds two new Read PTransforms that can be used to read a massive number of files textio.ReadAllFromText is for reading a PCollection of text files/file patterns. avroio.ReadAllFromAvro is for reading a PCollection of Avro files/file patterns. Most of the logic was generalized to a new PTransform filebasedsource.ReadAllFiles so that other file-based sources can be easily adapted to follow the same pattern. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam fileio_read_all Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3661.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3661 commit 5174e48a3e3ac495d452f438be916b3046ed1cf4 Author: chamik...@google.com Date: 2017-07-29T02:39:02Z Adds two new Read PTransforms that can be used to read a massive number of files. textio.ReadAllFromText is for reading a PCollection of text files/file patterns. avroio.ReadAllFromAvro is for reading a PCollection of Avro files/file patterns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3668: [BEAM-2141] Updates jenkins job for JDBCIOIT
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3668 [BEAM-2141] Updates jenkins job for JDBCIOIT This is a slightly updated version of Stephen Sisk's https://github.com/apache/beam/pull/3604. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam enable_jdbc_it Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3668 commit d61ca8f8845c4290e3e88dd9da2bd94605ab141b Author: chamik...@google.com Date: 2017-07-31T18:50:46Z Updates jenkins job for JDBCIOIT. This is a slightly updated version of Stephen Sisk's https://github.com/apache/beam/pull/3604. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3678: [BEAM-2708] Adds support for reading concatenated b...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3678 [BEAM-2708] Adds support for reading concatenated bzip2 files Adds tests for concatenated gzip and bzip2 files. Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually hitting 'DummyReadTransform' and not testing this feature. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam pbzip2_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3678 commit 40e1fbf1856190418d0c6c25c746037d4c109083 Author: chamik...@google.com Date: 2017-08-03T05:49:33Z Adds support for reading concatenated bzip2 files. Adds tests for concatenated gzip and bzip2 files. Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually hitting 'DummyReadTransform' and not testing this feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3681: [BEAM-2708] Adds support for reading concatenated b...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3681 [BEAM-2708] Adds support for reading concatenated bzip2 files Cherry-picking into 2.1.0 release branch. Corresponding fix for Java SDK was already cherry picked into 2.1.0 branch. I think it's good to get the Python SDK fix in as well so that SDKs are consistent. Adds support for reading concatenated bzip2 files Adds tests for concatenated gzip and bzip2 files. Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually hitting 'DummyReadTransform' and not testing this feature. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam bzip2_python_cherrypick Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3681.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3681 commit d6516c69e61f2061005d01a9e36ee1e4137a1478 Author: chamik...@google.com Date: 2017-08-03T05:49:33Z Adds support for reading concatenated bzip2 files. Adds tests for concatenated gzip and bzip2 files. Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' since it's actually hitting 'DummyReadTransform' and not testing this feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1728: [BEAM-1239] Updates Python SDK examples to use Beam...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1728 [BEAM-1239] Updates Python SDK examples to use Beam text source Currently many Python SDK examples use Dataflow native text source. This updates examples to use Beam text source available in textio.py. Additionally this updates usages of text sink to use textio.WriteToText() instead of io.TextFileSink (latter usage is deprecated). This does not update snippets.py which contain tests that dynamically modify source/sink for testing which is hard to do for custom text sink. That should be fixed in a separate CL. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam update_examples_to_use_text_source Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1728 commit 702a1d27089ba4df59c5b986923a50e7865e7a84 Author: Chamikara Jayalath Date: 2017-01-03T07:21:05Z Updates Python SDK examples to use Beam text source/sink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1728: [BEAM-1239] Updates Python SDK examples to use Beam...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/1728 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1738: [Beam-564] Updates Python source API to allow repor...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1738 [Beam-564] Updates Python source API to allow reporting consumed and remaining number of split points With this update Python BoundedSource/RangeTracker API can report consumed and remaining number of split points while performing a source read operations. Java SDK source API already supports reporting these signals. These signals can be used by runner implementations, for example, to perform scaling decisions. This provides a slightly simplified API compared to previous PR https://github.com/apache/incubator-beam/pull/881. Main differences compared to https://github.com/apache/incubator-beam/pull/881 are following. (1) set_done()/done() methods were removed from the RangeTracker interface. Downside is that RangeTracker will be unable to provide the signal that all records have been consumed. I think this signal is unnecessary since a runner can detect that anyways since the reader loop of the source ends at that point. (2) Callback between BoundedSource and RangeTracker was changed from reporting remaining number of split points to reporting unclaimed number of split points. This makes the implementation of the callback simpler for source authors. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam limited_parallelism_updated Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1738.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1738 commit d3fb7d86fcf8c42b8da27943318d08b5e97c41c6 Author: Chamikara Jayalath Date: 2017-01-05T03:10:09Z Updates Python SDK source API so that sources can report limited parallelism signals. With this update Python BoundedSource/RangeTracker API can report consumed and remaining number of split points while performing a source read operations, similar to Java SDK sources. These signals can be used by runner implementations, for example, to perform scaling decisions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1768: [BEAM-1239] Updates more examples to use Beam text ...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1768 [BEAM-1239] Updates more examples to use Beam text source/sink Updates snippets.py and custom_ptransform.py. Removes the dependency snippets_test.py has on Dataflow native text sink. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam update_snippets_textio Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1768.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1768 commit 30a018458f51a70c0e0d6e5431b219157af8a350 Author: Chamikara Jayalath Date: 2017-01-12T01:50:02Z Updates snippets to use Beam text source and sink. Removes the dependency snippets_test has on dataflow native text sink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1818: [BEAM-1298] Increments major used by Dataflow runne...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1818 [BEAM-1298] Increments major used by Dataflow runner to 5 Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam increment_major_version_5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1818.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1818 commit 1811458b0c33fba0dde909fc655452ad8a37c9f9 Author: Chamikara Jayalath Date: 2017-01-23T18:25:28Z Increments major version used by Dataflow runner to 5 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1820 [BEAM-1299] Removes Dataflow native text source and sink from Beam Python SDK. Users should be using Beam text source and sink available in module 'textio.py' instead of this. Also removes Dataflow native file source/sink that is only used by native text source/sink. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam remove_native_text_source_sink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1820.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1820 commit ab6b7026da410a3084da58de873c6b8b809dd1fb Author: Chamikara Jayalath Date: 2017-01-23T21:23:45Z Removes Dataflow native text source and sink from Beam SDK. Users should be using Beam text source and sink available in module 'textio.py' instead of this. Also removes Dataflow native file source/sink that is only used by native text source/sink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/1820 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...
GitHub user chamikaramj reopened a pull request: https://github.com/apache/beam/pull/1820 [BEAM-1299] Removes Dataflow native text source and sink from Beam Python SDK. Users should be using Beam text source and sink available in module 'textio.py' instead of this. Also removes Dataflow native file source/sink that is only used by native text source/sink. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam remove_native_text_source_sink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1820.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1820 commit ab6b7026da410a3084da58de873c6b8b809dd1fb Author: Chamikara Jayalath Date: 2017-01-23T21:23:45Z Removes Dataflow native text source and sink from Beam SDK. Users should be using Beam text source and sink available in module 'textio.py' instead of this. Also removes Dataflow native file source/sink that is only used by native text source/sink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/1820 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1866: [BEAM-1338] Updates places in SDK that creates thre...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1866 [BEAM-1338] Updates places in SDK that creates thread pools. Moves ThreadPool creation to a util function. Records and resets logging level due to this being reset by apitools when used with a ThreadPool. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam util_threadpool Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1866 commit 495fd12e2c899581a30bde1304cbf6c050dfd77f Author: Chamikara Jayalath Date: 2017-01-28T16:54:33Z Updates places in SDK that creates thread pools. Moves ThreadPool creation to a util function. Records and resets logging level due to this being reset by apitools when used with a ThreadPool. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1866: [BEAM-1338] Moves ThreadPool creation to a util fun...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/1866 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1890: [BEAM-564] Updates Python source API to allow repor...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1890 [BEAM-564] Updates Python source API to allow reporting consumed and remaining number of split points With this update Python BoundedSource/RangeTracker API can report consumed and remaining number of split points while performing a source read operations. Java SDK source API already supports reporting these signals. These signals can be used by runner implementations, for example, to perform scaling decisions. This provides a slightly simplified API compared to previous PR #881. Main differences compared to #881 are following. (1) set_done()/done() methods were removed from the RangeTracker interface. Downside is that RangeTracker will be unable to provide the signal that all records have been consumed. I think this signal is unnecessary since a runner can detect that anyways since the reader loop of the source ends at that point. (2) Callback between BoundedSource and RangeTracker was changed from reporting remaining number of split points to reporting unclaimed number of split points. This makes the implementation of the callback simpler for source authors. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam limited_parallelism_updated Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1890.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1890 commit 8c5761f1647b11c9b60919bdecfa2ac77f4e491d Author: Chamikara Jayalath Date: 2017-01-05T03:10:09Z Updates Python SDK source API so that sources can report limited parallelism signals. With this update Python BoundedSource/RangeTracker API can report consumed and remaining number of split points while performing a source read operations, similar to Java SDK sources. These signals can be used by runner implementations, for example, to perform scaling decisions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1916: [BEAM-1388] Updates default values used by retry de...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1916 [BEAM-1388] Updates default values used by retry decorator. Updates following defaults so that total wait time by default is more practical. num_retries from 16 to 7. max_delay_secs from 4 hours to 1 hour. With this update, for maximum number of retries, system will wait for 635 sec while wait before last retry being 320 sec. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_retry_defaults Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1916.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1916 commit 640f5c61a25c100df0eca79b1a4417b81dbb9a83 Author: Chamikara Jayalath Date: 2017-02-04T01:32:49Z Updates default values used by retry decorator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #1932: [BEAM-1406] Removes deprecated fileio.TextFileSink
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1932 [BEAM-1406] Removes deprecated fileio.TextFileSink Users should be using textio.WriteToText() transform instead of fileio.TextFileSink. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam remove_textfilesink_fileio Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1932.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1932 commit 2fed64bf5f3e7eda2a3a372556851cdbffeb1a1a Author: Chamikara Jayalath Date: 2017-02-07T00:01:11Z Removes deprecated fileio.TextFileSink. Users should be using textio.WriteToText() transform instead of fileio.TextFileSink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #4147: [BEAM-3209] Clarify documentation on support for re...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/4147 [BEAM-3209] Clarify documentation on support for reading from/writing to time par⦠â¦titioned BQ tables. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam clarify_time_partitioned_documentation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4147.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4147 commit 39dbcda6fd45240aa4d7c1c04438896a9a114b2c Author: chamik...@google.com Date: 2017-11-17T23:29:57Z Clarify documentation on support for reading from/writing to time partitioned BQ tables. ---
[GitHub] beam pull request #1978: [BEAM-1463] Updates BigQuery read transform to hand...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/1978 [BEAM-1463] Updates BigQuery read transform to handle 'null' fields properly for DirectRunner Updates BigQuery read transform so that DirectRunner handles 'null' fields properly. Before this change, for DirectRunner, a record (dictionary) returned by BigQuery read transform did not contain keys for fields that are 'null'. For DataflowRunner, these fields are available with value 'None'. I believe, retaining these fields value 'None' to be the proper behavior here. This change makes these two runners consistent when it comes to handling BigQuery 'null' values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam reading_null_fields_directrunner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1978.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1978 commit f6a94610b674760075c3c0af66b7b03da154f2bc Author: Chamikara Jayalath Date: 2017-02-10T22:19:53Z Updates BigQuery read transform so that DirectRunner handles 'null' fields properly. Before this change, for DirectRunner, a record (dictionary) returned by BigQuery read transform will not contain keys for fields that are 'null'. For DataflowRunner, these fields will be available with value 'None'. I believe, retaining these fields value 'None' to be the proper behavior here. This change makes these two runners consistent when it comes to handling BigQuery 'null' values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam-site pull request #186: Add chamikara as a committer
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam-site/pull/186 Add chamikara as a committer You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam-site website_add_to_team Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam-site/pull/186.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #186 commit a78aeafebc24996bf9a14fedc6b242a2db51eac6 Author: chamik...@google.com Date: 2017-03-18T00:28:21Z Add chamikara as a committer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #2289: [BEAM-1782] Updates BigQuery read transform to corr...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/2289 [BEAM-1782] Updates BigQuery read transform to correctly process empty repeated fields. This fixes DirectRunnner. DataflowRunner is already processing these fields correctly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam bq_empty_repeated Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2289 commit f3da5eb0f70f51c3e0b4b304b55d56cba7cd3f99 Author: chamik...@google.com Date: 2017-03-22T20:17:26Z Updates BigQuery read transform to correctly process empty repeated fields. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #2289: [BEAM-1782] Updates BigQuery read transform to corr...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/2289 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3036: [BEAM-2241] Renames some python classes and functio...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3036 [BEAM-2241] Renames some python classes and functions that were unnecessarily public Adds a note to documentation of classes that are public but should be only used internally by the SDK (non-user facing classes). Marks some of the modules as experimental. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_public_api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3036.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3036 commit e6f90ec7b8bd59bd6809edc1aa95e2e894dd2b84 Author: chamik...@google.com Date: 2017-05-10T02:56:14Z Renames some python classes and functions that were unnecessarily public. Adds a note to documentation of classes that are public but should be only used internally by the SDK (non-user facing classes). Marks some of the modules as experimental. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3041: [BEAM-2241] Renames some python classes and functio...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3041 [BEAM-2241] Renames some python classes and functions that were unnecessarily public Adds a note to documentation of classes that are public but should be only used internally by the SDK (non-user facing classes). Marks some of the modules as experimental. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_public_api_branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3041.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3041 commit f4b80ea47fc3a3d4c1ba901e646c47981483eabd Author: chamik...@google.com Date: 2017-05-10T08:44:56Z Renames some python classes and functions that were unnecessarily public. Adds a note to documentation of classes that are public but should be only used internally by the SDK (non-user facing classes). Marks some of the modules as experimental. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3041: [BEAM-2241] Renames some python classes and functio...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/3041 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3074: [BEAM-1340] Adds __all__ tags to classes in package...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3074 [BEAM-1340] Adds __all__ tags to classes in package apache_beam/io Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_public_api_all Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3074.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3074 commit b5ff3ba87869aab31eb502d039c853c46e7ff818 Author: chamik...@google.com Date: 2017-05-11T05:33:35Z Adds __all__ tags to classes in package apache_beam/io. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3074: [BEAM-1340] Adds __all__ tags to modules in package...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/3074 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3089: [BEAM-1340] Adds __all__ tags to classes in package...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3089 [BEAM-1340] Adds __all__ tags to classes in package apache_beam/io. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_public_api_all Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3089.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3089 commit fe534068f58d2c96c3fbc2c94441b77c2e3e28a9 Author: chamik...@google.com Date: 2017-05-11T18:46:46Z Adds __all__ tags to classes in package apache_beam/io. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam-site pull request #253: [BEAM-3240] Improves development and testing in...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam-site/pull/253 [BEAM-3240] Improves development and testing instructions related to Python SDK Updates contribution guide to include development and testing instructions for Python SDK. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam-site contrib_guide_python Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam-site/pull/253.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #253 commit b9dd624d27afdcf5ce48f5def52f094fcd797acd Author: chamik...@google.com Date: 2017-05-26T00:12:50Z Updates contribution guide to include development and testing instructions for Python SDK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3333: [BEAM-1630] Adds ability to dynamically replace PTr...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/ [BEAM-1630] Adds ability to dynamically replace PTransforms during runtime. Adds two new interfaces, PTransformMatcher and PTransformOverride. Currently only supports replacements where input and output types are an exact match (we have to address complexities due to type hints before supporting replacements with different types). This can be used to dynamically update a populated pipeline at runtime. Each runner can configure it's own overrides. This will be used by SplittableDoFn where matching ParDo transforms will be dynamically replaced by SplittableParDo. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam sdf_direct_runner_ptransform_override Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes # --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3414: [BEAM-2494] Remove GroupedShuffleRangeTracker which...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3414 [BEAM-2494] Remove GroupedShuffleRangeTracker which is unused in the SDK Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam remove_grouped_shuffle_range_tracker Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3414.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3414 commit fbe89781bbf32421cbafe19313e6fbe070115dc2 Author: chamik...@google.com Date: 2017-06-21T17:37:11Z Remove GroupedShuffleRangeTracker which is unused in the SDK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] beam pull request #3882: [BEAM-1630] Adds API for defining Splittable DoFns ...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3882 [BEAM-1630] Adds API for defining Splittable DoFns using Python SDK. See https://s.apache.org/splittable-do-fn-python-sdk for the design. This PR and the above doc were updated to reflect following recent updates to Splittable DoFn. * Support for ProcessContinuations * Support for dynamically updating output watermark irrespective of the output element production. This will be followed by a PR that adds support for reading Splittable DoFns using DirectRunner. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam sdf_api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3882 commit 2fd11b1c0e212a1b267dbafd69c96e26fef4d319 Author: chamik...@google.com Date: 2017-09-22T00:43:11Z Adds API for defining Splittable DoFns. See https://s.apache.org/splittable-do-fn-python-sdk for the design. This PR and the above doc were updated to reflect following recent updates to Splittable DoFn. * Support for ProcessContinuations * Support for dynamically updating output watermark irrespective of the output element production. This will be followed by a PR that adds support for reading Splittable DoFns using DirectRunner. ---
[GitHub] beam pull request #3892: [BEAM-2985] Updates WriteToBigQuery PTransform to g...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3892 [BEAM-2985] Updates WriteToBigQuery PTransform to get project id from GoogleCloud⦠â¦Options when using DirectRunner. WriteToBigQuery PTransform behaves differently for DirectRunner and DataflowRunner when it comes to determining the project that the output table belongs to. If a project is not specified, DataflowRunner defauls to GoogleCloudOptions.project while DirectRunner does not. This PR fixes this inconsistency by defaulting to GoogleCloudOptions.project for DirectRunner as well. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam bq_direct_runner_write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3892.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3892 commit f99db7932cab90dda2741d22b291e7f1eaad7336 Author: chamik...@google.com Date: 2017-09-23T00:59:50Z Updates WriteToBigQuery PTransform to get project id from GoogleCloudOptions when using DirectRunner. WriteToBigQuery PTransform behaves differently for DirectRunner and DataflowRunner when it comes to determining the project that the output table belongs to. If a project is not specified, DataflowRunner defauls to GoogleCloudOptions.project while DirectRunner does not. This PR fixes this inconsistency by defaulting to GoogleCloudOptions.project for DirectRunner as well. ---
[GitHub] beam pull request #3962: [Beam-3028] Fixes a bug in DatastoreIO query splitt...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3962 [Beam-3028] Fixes a bug in DatastoreIO query splitting. We were returning original query instead of the sub-queries resulting in data duplication when reading. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam query_splitting Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3962 commit 636b56964b750fba025c42e260219b60b085a868 Author: chamik...@google.com Date: 2017-10-09T00:02:43Z Fixes a bug in query splitting. We were returning original query instead of the sub-queries resulting in data duplication when reading. ---
[GitHub] beam pull request #3996: [BEAM-3029] Sets userAgent option in BigTableReadIT
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3996 [BEAM-3029] Sets userAgent option in BigTableReadIT Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam bigtable-it Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3996 commit 4457928a3d7e82426ff6019642d4e846131201b4 Author: chamik...@google.com Date: 2017-10-16T07:50:03Z Sets userAgent option in BigTableReadIT ---
[GitHub] beam pull request #3998: [BEAM-3029] Sets user agent in BigTableIO.Read.getB...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/3998 [BEAM-3029] Sets user agent in BigTableIO.Read.getBigTableService(). Cherry-picking this commit to 2.2.0 release branch. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam bigtable_read_it_fix_cerrypick Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3998.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3998 commit 25cab6be8d763a03d2a37f0647698cba79df6ac5 Author: chamik...@google.com Date: 2017-10-16T07:50:03Z Sets user agent in BigTableIO.Read.getBigTableService(). ---
[GitHub] beam pull request #4007: [BEAM-3065] Avoids generating proto files for Windo...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/4007 [BEAM-3065] Avoids generating proto files for Windows if grpcio-tools is not installed. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam avoid_proto_generation_windows Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4007.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4007 commit 0ca6d3025c0479d7e6bd3a70bca84f651e717167 Author: chamik...@google.com Date: 2017-10-18T01:46:40Z Avoids generating proto files for Windows if grpcio-tools is not installed. ---
[GitHub] beam pull request #3998: [BEAM-3029] Sets user agent in BigTableIO.Read.getB...
Github user chamikaramj closed the pull request at: https://github.com/apache/beam/pull/3998 ---
[GitHub] beam pull request #4025: [BEAM-3088] Improves size estimation of BigQueryTab...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/4025 [BEAM-3088] Improves size estimation of BigQueryTableSource. Updates BigQueryTableSource to consider data in streaming buffer when determining estimated size. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam bq_size_estimation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4025.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4025 commit 501b43800e95a8722315c43c7379725407d04f7c Author: chamik...@google.com Date: 2017-10-22T02:20:07Z Updates BigQueryTableSource to consider data in streaming buffer when determining estimated size. ---
[GitHub] beam pull request #4064: [BEAM-1630] Adds support for processing Splittable ...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/4064 [BEAM-1630] Adds support for processing Splittable DoFns using DirectRunner. Updates DoFn invocation logic to allow invoking SDF methods. Adds SDF machinery that will be common to DirectRunner and other runners. Adds DirectRunner specific transform overrides, evaluators, and other logic for processing Splittable DoFns. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam sdf_direct_runner_3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4064 commit 7549b5c2ebe2ae47af9066eaf97364a27e828ab5 Author: chamik...@google.com Date: 2017-10-31T08:16:43Z Adds support for processing Splittable DoFns using DirectRunner. Updates DoFnInvocation logic to allow invoking SDF methods. Adds SDF machinery that will be common to DirectRunner and other runners. Adds DirectRunner specific transform overrides, evaluators, and other logic for processing Splittable DoFns. ---
[GitHub] beam pull request #4067: Updates Python datastore wordcount example to take ...
GitHub user chamikaramj opened a pull request: https://github.com/apache/beam/pull/4067 Updates Python datastore wordcount example to take a dataset parameter. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/beam update_datastore_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4067 commit 0d565d6d2a8e8c85089b2e8ea75eb768fa07d2df Author: chamik...@google.com Date: 2017-11-01T01:37:29Z Updates Python datastore wordcount example to take a dataset parameter. ---