[GitHub] incubator-beam pull request #1385: Fixes couple of issues of FileBasedSource...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/1385 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1385: Fixes couple of issues of FileBasedSource...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/1385 Fixes couple of issues of FileBasedSource. (1) Updates code so that a user-specified coder properly gets set to splits. (2) Currently each SingleFileSource takes a reference to FileBasedSource while FileBasedSource takes a reference to Concatsource. ConcatSource has a reference to list of SingleFileSources. This results in quadratic space complexity when serializing splits of a FileBasedSource. This CL fixes this by making sure that FileBasedSource is cloned before taking a reference to ConcatSource You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam filebasedsource_manyfiles_fix1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1385.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1385 commit 0851a8fc2cf6214673777e31bdacaa3409547a6c Author: Chamikara Jayalath Date: 2016-11-18T03:18:26Z Fixes a couple of issues of FileBasedSource. (1) Updates code so that a user-specified coder properly gets set to sub-sources. (2) Currently each SingleFileSource takes a reference to FileBasedSource while FileBasedSource takes a reference to Concatsource. ConcatSource has a reference to list of SingleFileSources. This results in quadratic space complexity when serializing splits of a FileBasedSource. This CL fixes this issue by making sure that FileBasedSource is cloned before taking a reference to ConcatSource --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1267: Fixes two bugs in avroio_test 'test_corru...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/1267 Fixes two bugs in avroio_test 'test_corrupted_file'. (1) Updates the test to perform corruption properly (setting 'A' and 'B'). (2) Removes an invalid usage of bytearray(). You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam fix_avro_test_flake Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1267.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1267 commit a7e543a15f0db890a80777251719db2d05001ba2 Author: Chamikara Jayalath Date: 2016-11-02T21:33:09Z Fixes two bugs in avroio_test 'test_corrupted_file'. (1) Updates test to perform corruption properly (setting 'A' and 'B'). (2) Removes an invalid usage of bytearray(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1235: [BEAM-700] Improvements related to size e...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/1235 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1235: [BEAM-700] Improvements related to size e...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/1235 [BEAM-700] Improvements related to size estimation. Updates FileBasedSource so that size estimation of glob patterns that expand into a large number of files is done using sampling. Updates Dataflow runner to set estimated sizes of sources when submitting jobs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam estimate_size_of_sources Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1235 commit 42b1d9f541a54ea49ec717934632e8e82c21e911 Author: chamik...@google.com Date: 2016-10-31T14:39:13Z Improvements related to size estimation. Updates FileBasedSource so that size estimation of glob patterns that expand into a large number of files is done using sampling. Updates Dataflow runner to set estimated sizes of sources when submitting jobs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1090: [BEAM-737][BEAM-738] Updates source API d...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/1090 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1090: [BEAM-737][BEAM-738] Updates source API d...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/1090 [BEAM-737][BEAM-738] Updates source API documentation to mention that sources should be immutable and updates existing sources accordingly Updates source API documentation to mention that source objects should not be mutated. Updates textio._TextSource so that it does not get mutated while reading. Updates source_test_utils so that sources objects do not get cloned while testing. This could help to catch sources that erroneously get modified while reading. Adds reentrancy tests for text and Avro sources. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam update_boundedsource_stateless Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1090.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1090 commit 5bd56f35d597b142463b1a141e2ac219e6902fc3 Author: Chamikara Jayalath Date: 2016-10-12T23:51:20Z Updates source API documentation to mention that source objects should not be mutated. Updates textio._TextSource so that it does not get mutated while reading. Updates source_test_utils so that sources objects do not get cloned while testing. This could help to catch sources that erroneously get modified while reading. Adds reentracy tests for text and Avro sources. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1058: Fixes a bug in avroio_test.py
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/1058 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1058: Fixes a bug in avroio_test.py
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/1058 Fixes a bug in avroio_test.py Fixes a bug in avroio_test.py where we open a binary file without 'b' mode. Without this, file can get corrupted in Windows and the test becomes flaky. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam fix_avro_test_windows Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1058.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1058 commit 6ed258e97a3bde460315b5aef1449c38f80dc564 Author: Chamikara Jayalath Date: 2016-10-05T23:23:09Z Fixes a bug in avroio_test.py where we open a binary file without 'b' mode. Without this file can get corrupted in Windows and the test becomes flaky. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #881: [BEAM-564] Updates sources to report consu...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/881 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #1002: [BEAM-614] Updates FileBasedSource to sup...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/1002 [BEAM-614] Updates FileBasedSource to support CompressionType.AUTO. Updates FileBasedSource to support CompressionType.AUTO. Fixes some tests that were not properly being tested. Adds tests for CompressionType.AUTO. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam fix_compressed_auto_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1002.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1002 commit 37aa7a457be0c8bfef0bb5f1b6cc973ead93f7b1 Author: chamik...@google.com Date: 2016-09-26T04:44:34Z Updates filebasedsource to support CompressionType.AUTO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #987: Adds __all__ tags to source modules.
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/987 Adds __all__ tags to source modules. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam add_all_tags_to_source_modules Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #987 commit fbda5eeff89b65d8022fc32babb328386cae0bca Author: Chamikara Jayalath Date: 2016-09-22T16:08:11Z Adds __all__ tags to source modules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #978: [BEAM-643] Updates Dataflow API client.
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/978 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #975: [BEAM-643] Adds support for specifying a c...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/975 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #979: [BEAM-643] Updates lint configurations to ...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/979 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #979: [BEAM-643] Updates lint configurations to ...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/979 [BEAM-643] Updates lint configurations to ignore generated files. Adds ability to ignore certain generated files when running pylint and pep8. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam dataflow_pylint_exclude_files Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/979.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #979 commit c9a7434804cb84ced7de84794dc9070de4069db2 Author: Chamikara Jayalath Date: 2016-09-20T05:27:47Z Updates lint configurations to ignore generated files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #978: [BEAM-643] Updates Dataflow API client.
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/978 [BEAM-643] Updates Dataflow API client. Updates Cloud Dataflow API client files to the latest version. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam dataflow_client_api_update Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/978.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #978 commit a1cc0ba4e8eed75252587fc7cbd9aaec8c396f01 Author: Chamikara Jayalath Date: 2016-09-20T05:16:04Z Updates Dataflow API client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #975: [BEAM-643] Adds support for specifying a c...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/975 [BEAM-643] Adds support for specifying a custom service account. Adds support for specifying a custom service account when using DataflowPipelineRunner. Updates Dataflow API client to latest version. Adds ability to skip generated files during lint checks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_service_account Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/975.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #975 commit 11ace177a238cbd8439b1ee8d13e83cf6285c304 Author: Chamikara Jayalath Date: 2016-09-19T22:52:53Z Adds support for specifying a custom service account. Updates Dataflow API client to latest version. Adds ability to skip generated files during lint checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #920: [BEAM-553] Adds a text source for Python S...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/920 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #920: [BEAM-553] Adds a text source for Python S...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/920 [BEAM-553] Adds a text source for Python SDK. Current text source (fileio.TextFileSource) is specific to Dataflow runner. This adds a runner independent TextSource that is based on iobase.BoundedSource interface. Adds a textio module that contains text source, text sink, and PTransforms that can be used to read and write text files. Adds a significant number of tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam text_source Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/920.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #920 commit bb1ff90307b563656a54731cada05e41cd9e82b8 Author: Chamikara Jayalath Date: 2016-08-30T01:08:46Z Adds a text source to Python SDK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #890: Updates SourceTestBase concurrent splittin...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/890 Updates SourceTestBase concurrent splitting test to share thread pool Updates SourceTestBase concurrent splitting test to share thread pool across runs. Without this, runs could fail in environments that prevents two many threads from being created. Does some slight fixes to the source_test_utils_test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam fix_source_utils_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/890.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #890 commit b901c59a15ebe9171c3b216e666fc0f79a61429d Author: Chamikara Jayalath Date: 2016-08-26T06:18:42Z Updates SourceTestBase concurrent splitting test to share thread pool across runs. Without this, runs could fail in environments that prevents two many threads from being created. Performs some slight fixes to the source_test_utils_test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #881: Updates sources to report consumed and rem...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/881 Updates sources to report consumed and remaining number of split points. Adds several methods to the RangeTracker interface to support this. Please see comments for details. Updates AvroSource and LineSource (test) to report split points properly. Runners can use this information to determine the amount of remaining and consumed parallelism of source read operations. Java SDK sources framework already supports reporting these signals. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam limited_parallelism Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/881.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #881 commit 8b78a7bfb945f80b346000bfbd39bb6fa4a933e3 Author: Chamikara Jayalath Date: 2016-08-24T22:31:47Z Updates sources to report consumed and remaining number of split points. Adds several methods to the RangeTracker interface to support this. Please see comments for details. Updates AvroSource and LineSource (test) to report split points properly. Runners can use this information to determine the amount of remaining and consumed parallelism of source read operations. Java SDK sources framework already supports reporting these signals. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #866: [BEAM-578] Updates FileBasedSource so that...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/866 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #866: [BEAM-578] Updates FileBasedSource so that...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/866 [BEAM-578] Updates FileBasedSource so that sub-classes can prevent splitting. File patterns will be split into sources of individual files, but any further splitting into data ranges will be prevented. This prevents both initial and dynamic splitting. Introduces UnsplittableRangeTracker, which can be used to make any given RangeTracker object unsplittable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam unsplittable_file_based_source Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #866 commit d0004369875a999e902a93d8df5f7839b7a56674 Author: Chamikara Jayalath Date: 2016-08-23T04:06:36Z Updates FileBasedSource so that sub-class can prevent splitting to data ranges. File patterns will be split into sources of individual files, but any further splitting into data ranges will be prevented. This prevents both initial and dynamic splitting. Introduces UnsplittableRangeTracker, which can be used to make any given RangeTracker object unsplittable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #779: [BEAM-522] Fixes GcsIO.exists() to properl...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/779 [BEAM-522] Fixes GcsIO.exists() to properly handle files that do not exist Currently this invocation fails for non existing files instead of returning false. Updates FileSink.finalize_write() so that we capture and log any transient errors that get thrown at the channel_factory.exists() call. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam sink_finalize_fix_idempotency Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #779 commit 792c3b5c79b6e979bc34bcf457f8a33cebd74daf Author: Chamikara Jayalath Date: 2016-08-04T01:25:41Z Fixes GcsIO.exists() to properly handle files that do not exist. Currently this invocation fails for non existing files instead of returning false. Updates FileSink.finalize_write() so that we capture and log any transient errors that get thrown at the channel_factory.exists() call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #765: [BEAM-502] Updates JSON to/from Python obj...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/765 [BEAM-502] Updates JSON to/from Python object conversion to handle null/None values. Updates Python object to JSON conversion to handle 'None' values. Updates JSON to Python object conversion to return 'None' for JSON null objects instead of '[]'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam json_handle_null Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #765 commit 3a4f2234e36791fd64125d67dbedaebbc2a981b1 Author: Chamikara Jayalath Date: 2016-08-02T06:12:10Z Updates json to/from Python object conversion to properly handle None values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #763: [BEAM-499] Deletes some code that is not u...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/763 [BEAM-499] Deletes some code that is not used by SDK. Some code in apiclient.py is not used by Python SDK. Deleting unused code and corresponding tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam delete_unused_apiclient_code Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/763.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #763 commit 7d50d8040585e0cea5bc02de4cb199f29c1472fc Author: Chamikara Jayalath Date: 2016-07-29T22:40:39Z Deletes some code that is not used by SDK. Also deletes corresponding tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #672: [BEAM-360] Adds a PTransform for Avro sour...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/672 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #667: [BEAM-455] Adds a test harnesses and utili...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/667 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #670: Clarifies that 'TextFileSource' only suppo...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/670 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #672: [BEAM-360] Adds a PTransform for Avro sour...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/672 [BEAM-360] Adds a PTransform for Avro source and updates snippets. Wrapping a custom source as a 'PTransform' is better than directly using the source using 'df.Read' since the 'PTransform' can be extended without breaking end-user code. Updates the documentation of avroio module. Adds 'PTransform' wrappers to custom sources and sinks in 'snippets.py'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam snippets_source_sink_ptransforms Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/672.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #672 commit 0f206bd4581a341cec0c29348aa68d552b7f5a00 Author: Charles Chen Date: 2016-07-15T21:37:43Z Adds a PTransform for Avro source. Wrapping a custom source as a PTransform is better than directly using the source using df.Read since the PTransform can be extended without breaking end-user code. Updates the documentation of avroio module. Adds PTransform wrappers to custom sources and sinks in snippets.py. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #670: Clarifies that 'TextFileSource' only suppo...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/670 Clarifies that 'TextFileSource' only supports UTF-8 and ASCII encodings. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam utf8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/670.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #670 commit d829fc82fd78cd3fefb672381e5580b30bf13ff3 Author: Chamikara Jayalath Date: 2016-07-15T21:12:16Z Clarifies that 'TextFileSource' only supports UTF-8 and ASCII.encodings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #667: [BEAM-455] Adds a test harnesses and utili...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/667 [BEAM-455] Adds a test harnesses and utilities framework for sources. Helper functions and test harnesses for checking correctness of source (``iobase.BoundedSource``) and range tracker (``iobase.RangeTracker``) implementations. Contains a few lightweight utilities (e.g. reading items from a source such as ``readFromSource()``, as well as heavyweight property testing and stress testing harnesses that help getting a large amount of test coverage with few code. Most notable ones are: * ``assertSourcesEqualReferenceSource()`` helps testing that the data read by the union of sources produced by ``BoundedSource.split()`` is the same as data read by the original source. * If your source implements dynamic work rebalancing, use the ``assertSplitAtFraction()`` family of functions - they test behavior of ``RangeTracker.try_split()``, in particular, that various consistency properties are respected and the total set of data read by the source is preserved when splits happen. Use ``assertSplitAtFractionBehavior()`` to test individual cases of dynamic work rebalancing and use ``assertSplitAtFractionExhaustive()`` as a heavy-weight stress test including concurrency. Updates 'avroio_test.py' to use the test framework. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam testingframework_l Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/667.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #667 commit b3ac40ecca86f5952bd5ac3799e55e3e2a1a72a6 Author: Chamikara Jayalath Date: 2016-07-15T18:07:48Z Adds a test harnesses and utilities framework for sources. Helper functions and test harnesses for checking correctness of source (``iobase.BoundedSource``) and range tracker (``iobase.RangeTracker``) implementations. Contains a few lightweight utilities (e.g. reading items from a source such as ``readFromSource()``, as well as heavyweight property testing and stress testing harnesses that help getting a large amount of test coverage with few code. Most notable ones are: * ``assertSourcesEqualReferenceSource()`` helps testing that the data read by the union of sources produced by ``BoundedSource.split()`` is the same as data read by the original source. * If your source implements dynamic work rebalancing, use the ``assertSplitAtFraction()`` family of functions - they test behavior of ``RangeTracker.try_split()``, in particular, that various consistency properties are respected and the total set of data read by the source is preserved when splits happen. Use ``assertSplitAtFractionBehavior()`` to test individual cases of ``splitAtFraction()`` and use ``assertSplitAtFractionExhaustive()`` as a heavy-weight stress test including concurrency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/599 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...
GitHub user chamikaramj reopened a pull request: https://github.com/apache/incubator-beam/pull/599 [BEAM-360] Some updates related to dynamic work rebalancing of custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #599 commit 19a41ccf5bcf00192e3646258eae0cbce85da23b Author: Chamikara Jayalath Date: 2016-07-07T03:25:04Z Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 4415989ef0dfd656643e6e8575b6e2090b4437b5 Author: Chamikara Jayalath Date: 2016-07-07T03:34:21Z Adds more comments. commit 6aa697465e88f827a3121a1de8bad1b810d904da Author: Chamikara Jayalath Date: 2016-07-07T04:41:20Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 1e01b1f5cd70e5b39cd064577110898c623e524a Author: Chamikara Jayalath Date: 2016-07-08T19:01:42Z Reverting some updates. commit 171df1ecedd51c7c72db309d526dfa9badf1 Author: Chamikara Jayalath Date: 2016-07-08T22:34:52Z Adds a method 'fileio.ChannelFactory.size_in_bytes()'' that can be used to determine the size of a single file. Updates 'filebasedsource' to use this method when determining size of files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/599 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/599 [BEAM-360] Some updates related to dynamic work rebalancing of custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #599 commit e51d4acf12133a79671c567c9ff709c941c54f8c Author: Chamikara Jayalath Date: 2016-06-21T01:09:50Z Implements a framework for developing sources for new file types. Module 'filebasedsource' provides a framework for creating sources for new file types. This framework readily implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. commit cacb613448b47592f8415570f7b64bc6de797f91 Author: Chamikara Jayalath Date: 2016-07-07T03:25:04Z Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 264b4afc17c255e568a490e02ce47e9fb4b1e17a Author: Chamikara Jayalath Date: 2016-07-07T03:34:21Z Adds more comments. commit 49e097f9c5c3d8c2bca48d3416b4934a4d86ed34 Author: Chamikara Jayalath Date: 2016-07-07T04:41:06Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit c9696c9e17c9c7a6fc13d53d4da21ac9b325c73c Author: Chamikara Jayalath Date: 2016-07-07T04:41:20Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #565: [BEAM-393] Adds more code snippets for Pyt...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/565 [BEAM-393] Adds more code snippets for Python SDK Adds code snippets related to following. (1) Creating and using a new custom source (2) Creating and using a new custom sink (3) Performing joins using side inputs You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam update_snippets2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/565.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #565 commit dfe0af624a43c0b6df23dc078bd5af300ed8848f Author: Chamikara Jayalath Date: 2016-06-25T08:53:12Z Adds code snippets for custom sources and sinks commit 9dc3ce0c2d6d8f596a65c47fc64f66daae1a9b87 Author: Chamikara Jayalath Date: 2016-06-28T21:01:43Z Adds code snippets for custom sources and sinks. commit 647013d9d4393d89b3026e5150ddefd59740e4f8 Author: Chamikara Jayalath Date: 2016-06-28T23:28:23Z Adds code snippets for custom sources and sinks. commit 312c2c13aac13869851f5fe9f6bca27962f22b5b Author: Chamikara Jayalath Date: 2016-06-28T23:33:22Z Adds code snippets for custom sources and sinks. commit 06ef713fe20945c35021d63081104f2dbe6115aa Author: Chamikara Jayalath Date: 2016-06-29T22:30:22Z Adds code snippets for custom sources and sinks. commit c5a916940af6176a12380246224320541a0af2b0 Author: Chamikara Jayalath Date: 2016-06-29T23:21:02Z Adds code snippets for custom sources, custom sinks, and joining using side inputs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #507: [BEAM-360] Implements a framework for deve...
Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/507 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request #507: [BEAM-360] Implements a framework for deve...
GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/507 [BEAM-360] Implements a framework for developing Python SDK sources for new file types Module 'filebasedsource' provides a framework for creating sources for new file types. This framework implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam filebasedsource Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/507.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #507 commit e51d4acf12133a79671c567c9ff709c941c54f8c Author: Chamikara Jayalath Date: 2016-06-21T01:09:50Z Implements a framework for developing sources for new file types. Module 'filebasedsource' provides a framework for creating sources for new file types. This framework readily implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---