[GitHub] incubator-beam pull request #1385: Fixes couple of issues of FileBasedSource...

2016-11-18 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/1385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1385: Fixes couple of issues of FileBasedSource...

2016-11-17 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/1385

Fixes couple of issues of FileBasedSource.

(1) Updates code so that a user-specified coder properly gets set to splits.

(2) Currently each SingleFileSource takes a reference to FileBasedSource 
while  FileBasedSource takes a reference to Concatsource.  ConcatSource has a 
reference to list of SingleFileSources. This results in quadratic space 
complexity when serializing splits of a FileBasedSource. This CL fixes this by 
making sure that FileBasedSource is cloned before taking a reference to  
ConcatSource

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
filebasedsource_manyfiles_fix1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1385


commit 0851a8fc2cf6214673777e31bdacaa3409547a6c
Author: Chamikara Jayalath 
Date:   2016-11-18T03:18:26Z

Fixes a couple of issues of FileBasedSource.

(1) Updates code so that a user-specified coder properly gets set to 
sub-sources.

(2) Currently each SingleFileSource takes a reference to FileBasedSource 
while  FileBasedSource takes a reference to Concatsource.  ConcatSource has a 
reference to list of SingleFileSources. This results in quadratic space 
complexity when serializing splits of a FileBasedSource. This CL fixes this 
issue by making sure that FileBasedSource is cloned before taking a reference 
to  ConcatSource




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1267: Fixes two bugs in avroio_test 'test_corru...

2016-11-02 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/1267

Fixes two bugs in avroio_test 'test_corrupted_file'.

(1) Updates the test to perform corruption properly (setting 'A' and 'B').
(2) Removes an invalid usage of bytearray().

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam fix_avro_test_flake

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1267


commit a7e543a15f0db890a80777251719db2d05001ba2
Author: Chamikara Jayalath 
Date:   2016-11-02T21:33:09Z

Fixes two bugs in avroio_test 'test_corrupted_file'.

(1) Updates test to perform corruption properly (setting 'A' and 'B').
(2) Removes an invalid usage of bytearray().




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1235: [BEAM-700] Improvements related to size e...

2016-10-31 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/1235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1235: [BEAM-700] Improvements related to size e...

2016-10-31 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/1235

[BEAM-700] Improvements related to size estimation.

Updates FileBasedSource so that size estimation of glob patterns that 
expand into a large number of files is done using sampling.

Updates Dataflow runner to set estimated sizes of sources when submitting 
jobs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
estimate_size_of_sources

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1235


commit 42b1d9f541a54ea49ec717934632e8e82c21e911
Author: chamik...@google.com 
Date:   2016-10-31T14:39:13Z

Improvements related to size estimation.

Updates FileBasedSource so that size estimation of glob patterns that 
expand into a large number of files is done using sampling.

Updates Dataflow runner to set estimated sizes of sources when submitting 
jobs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1090: [BEAM-737][BEAM-738] Updates source API d...

2016-10-18 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/1090


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1090: [BEAM-737][BEAM-738] Updates source API d...

2016-10-12 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/1090

[BEAM-737][BEAM-738] Updates source API documentation to mention that 
sources should be immutable and updates existing sources accordingly

Updates source API documentation to mention that source objects should not 
be mutated.

Updates  textio._TextSource so that it does not get mutated while reading.

Updates source_test_utils so that sources objects do not get cloned while 
testing. This could help to catch sources that erroneously get modified while 
reading.

Adds reentrancy tests for text and Avro sources.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
update_boundedsource_stateless

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1090


commit 5bd56f35d597b142463b1a141e2ac219e6902fc3
Author: Chamikara Jayalath 
Date:   2016-10-12T23:51:20Z

Updates source API documentation to mention that source objects should not 
be mutated.

Updates  textio._TextSource so that it does not get mutated while reading.

Updates source_test_utils so that sources objects do not get cloned while 
testing. This could help to catch sources that erroneously get modified while 
reading.

Adds reentracy tests for text and Avro sources.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1058: Fixes a bug in avroio_test.py

2016-10-07 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/1058


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1058: Fixes a bug in avroio_test.py

2016-10-05 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/1058

Fixes a bug in avroio_test.py

Fixes a bug in avroio_test.py where we open a binary file without 'b' mode. 
Without this, file can get corrupted in Windows and the test becomes flaky.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
fix_avro_test_windows

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1058.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1058


commit 6ed258e97a3bde460315b5aef1449c38f80dc564
Author: Chamikara Jayalath 
Date:   2016-10-05T23:23:09Z

Fixes a bug in avroio_test.py where we open a binary file without 'b'
mode. Without this file can get corrupted in Windows and the test becomes 
flaky.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #881: [BEAM-564] Updates sources to report consu...

2016-10-05 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/881


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #1002: [BEAM-614] Updates FileBasedSource to sup...

2016-09-26 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/1002

[BEAM-614] Updates FileBasedSource to support CompressionType.AUTO.

Updates FileBasedSource to support CompressionType.AUTO.

Fixes some tests that were not properly being tested.

Adds tests for CompressionType.AUTO.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
fix_compressed_auto_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1002.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1002


commit 37aa7a457be0c8bfef0bb5f1b6cc973ead93f7b1
Author: chamik...@google.com 
Date:   2016-09-26T04:44:34Z

Updates filebasedsource to support CompressionType.AUTO.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #987: Adds __all__ tags to source modules.

2016-09-22 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/987

Adds __all__ tags to source modules.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
add_all_tags_to_source_modules

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/987.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #987


commit fbda5eeff89b65d8022fc32babb328386cae0bca
Author: Chamikara Jayalath 
Date:   2016-09-22T16:08:11Z

Adds __all__ tags to source modules.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #978: [BEAM-643] Updates Dataflow API client.

2016-09-20 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/978


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #975: [BEAM-643] Adds support for specifying a c...

2016-09-20 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/975


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #979: [BEAM-643] Updates lint configurations to ...

2016-09-20 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/979


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #979: [BEAM-643] Updates lint configurations to ...

2016-09-19 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/979

[BEAM-643] Updates lint configurations to ignore generated files.

Adds ability to ignore certain generated files when running pylint and pep8.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
dataflow_pylint_exclude_files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/979.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #979


commit c9a7434804cb84ced7de84794dc9070de4069db2
Author: Chamikara Jayalath 
Date:   2016-09-20T05:27:47Z

Updates lint configurations to ignore generated files.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #978: [BEAM-643] Updates Dataflow API client.

2016-09-19 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/978

[BEAM-643] Updates Dataflow API client.

Updates Cloud Dataflow API client files to the latest version.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
dataflow_client_api_update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/978.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #978


commit a1cc0ba4e8eed75252587fc7cbd9aaec8c396f01
Author: Chamikara Jayalath 
Date:   2016-09-20T05:16:04Z

Updates Dataflow API client.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #975: [BEAM-643] Adds support for specifying a c...

2016-09-19 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/975

[BEAM-643] Adds support for specifying a custom service account.

Adds support for specifying a custom service account when using 
DataflowPipelineRunner.

Updates Dataflow API client to latest version.

Adds ability to skip generated files during lint checks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
custom_service_account

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #975


commit 11ace177a238cbd8439b1ee8d13e83cf6285c304
Author: Chamikara Jayalath 
Date:   2016-09-19T22:52:53Z

Adds support for specifying a custom service account.

Updates Dataflow API client to latest version.

Adds ability to skip generated files during lint checks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #920: [BEAM-553] Adds a text source for Python S...

2016-09-18 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/920


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #920: [BEAM-553] Adds a text source for Python S...

2016-09-05 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/920

[BEAM-553] Adds a text source for Python SDK.

Current text source (fileio.TextFileSource) is specific to Dataflow runner. 
This adds a runner independent TextSource that is based on iobase.BoundedSource 
interface.

Adds a textio module that contains text source, text sink, and PTransforms 
that can be used to read and write text files.

Adds a significant number of tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam text_source

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #920


commit bb1ff90307b563656a54731cada05e41cd9e82b8
Author: Chamikara Jayalath 
Date:   2016-08-30T01:08:46Z

Adds a text source to Python SDK.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #890: Updates SourceTestBase concurrent splittin...

2016-08-25 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/890

Updates SourceTestBase concurrent splitting test to share thread pool

Updates SourceTestBase concurrent splitting test to share thread pool 
across runs.

Without this, runs could fail in environments that prevents two many 
threads from being created.

Does some slight fixes to the source_test_utils_test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
fix_source_utils_test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #890


commit b901c59a15ebe9171c3b216e666fc0f79a61429d
Author: Chamikara Jayalath 
Date:   2016-08-26T06:18:42Z

Updates SourceTestBase concurrent splitting test to share thread pool 
across runs.

Without this, runs could fail in environments that prevents two many 
threads from being created.

Performs some slight fixes to the source_test_utils_test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #881: Updates sources to report consumed and rem...

2016-08-24 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/881

Updates sources to report consumed and remaining number of split points.

Adds several methods to the RangeTracker interface to support this. Please 
see comments for details.

Updates AvroSource and LineSource (test) to report split points properly.

Runners can use this information to determine the amount of remaining and 
consumed parallelism of source read operations. Java SDK sources framework 
already supports reporting these signals.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam limited_parallelism

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #881


commit 8b78a7bfb945f80b346000bfbd39bb6fa4a933e3
Author: Chamikara Jayalath 
Date:   2016-08-24T22:31:47Z

Updates sources to report consumed and remaining number of split points.

Adds several methods to the RangeTracker interface to support this. Please 
see comments for details.

Updates AvroSource and LineSource (test) to report split points properly.

Runners can use this information to determine the amount of remaining and 
consumed parallelism of source read operations. Java SDK sources framework 
already supports reporting these signals.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #866: [BEAM-578] Updates FileBasedSource so that...

2016-08-23 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #866: [BEAM-578] Updates FileBasedSource so that...

2016-08-22 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/866

[BEAM-578] Updates FileBasedSource so that sub-classes can prevent 
splitting.

File patterns will be split into sources of individual files, but any 
further splitting into data ranges will be prevented. This prevents both 
initial and dynamic splitting.

Introduces UnsplittableRangeTracker, which can be used to make any given 
RangeTracker object unsplittable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
unsplittable_file_based_source

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #866


commit d0004369875a999e902a93d8df5f7839b7a56674
Author: Chamikara Jayalath 
Date:   2016-08-23T04:06:36Z

Updates FileBasedSource so that sub-class can prevent splitting to data 
ranges.

File patterns will be split into sources of individual files, but any 
further splitting into data ranges will be prevented. This prevents both 
initial and dynamic splitting.

Introduces UnsplittableRangeTracker, which can be used to make any given 
RangeTracker object unsplittable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #779: [BEAM-522] Fixes GcsIO.exists() to properl...

2016-08-03 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/779

[BEAM-522] Fixes GcsIO.exists() to properly handle files that do not exist

Currently this invocation fails for non existing files instead of returning 
false.

Updates FileSink.finalize_write() so that we capture and log any transient 
errors that get thrown at the channel_factory.exists() call.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
sink_finalize_fix_idempotency

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #779


commit 792c3b5c79b6e979bc34bcf457f8a33cebd74daf
Author: Chamikara Jayalath 
Date:   2016-08-04T01:25:41Z

Fixes GcsIO.exists() to properly handle files that do not exist.

Currently this invocation fails for non existing files instead of returning 
false.

Updates FileSink.finalize_write() so that we capture and log any transient 
errors that get thrown at the channel_factory.exists() call.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #765: [BEAM-502] Updates JSON to/from Python obj...

2016-08-01 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/765

[BEAM-502] Updates JSON to/from Python object conversion to handle 
null/None values.

Updates Python object to JSON conversion to handle 'None' values.

Updates JSON to Python object conversion to return 'None' for JSON null 
objects instead of '[]'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam json_handle_null

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #765


commit 3a4f2234e36791fd64125d67dbedaebbc2a981b1
Author: Chamikara Jayalath 
Date:   2016-08-02T06:12:10Z

Updates json to/from Python object  conversion to properly handle None 
values.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #763: [BEAM-499] Deletes some code that is not u...

2016-07-29 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/763

[BEAM-499] Deletes some code that is not used by SDK.

Some code in apiclient.py is not used by Python SDK.

Deleting unused code and corresponding tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
delete_unused_apiclient_code

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #763


commit 7d50d8040585e0cea5bc02de4cb199f29c1472fc
Author: Chamikara Jayalath 
Date:   2016-07-29T22:40:39Z

Deletes some code that is not used by SDK.

Also deletes corresponding tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #672: [BEAM-360] Adds a PTransform for Avro sour...

2016-07-19 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #667: [BEAM-455] Adds a test harnesses and utili...

2016-07-19 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #670: Clarifies that 'TextFileSource' only suppo...

2016-07-18 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/670


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #672: [BEAM-360] Adds a PTransform for Avro sour...

2016-07-18 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/672

[BEAM-360] Adds a PTransform for Avro source and updates snippets.

Wrapping a custom source as a 'PTransform' is better than directly using 
the source using 'df.Read' since the 'PTransform' can be extended without 
breaking end-user code.

Updates the documentation of avroio module.

Adds 'PTransform' wrappers to custom sources and sinks in 'snippets.py'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
snippets_source_sink_ptransforms

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #672


commit 0f206bd4581a341cec0c29348aa68d552b7f5a00
Author: Charles Chen 
Date:   2016-07-15T21:37:43Z

Adds a PTransform for Avro source.

Wrapping a custom source as a PTransform is better than directly using the 
source using df.Read since the PTransform can be extended without breaking 
end-user code.

Updates the documentation of avroio module.

Adds PTransform wrappers to custom sources and sinks in snippets.py.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #670: Clarifies that 'TextFileSource' only suppo...

2016-07-15 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/670

Clarifies that 'TextFileSource' only supports UTF-8 and ASCII encodings.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam utf8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #670


commit d829fc82fd78cd3fefb672381e5580b30bf13ff3
Author: Chamikara Jayalath 
Date:   2016-07-15T21:12:16Z

Clarifies that 'TextFileSource' only supports UTF-8 and ASCII.encodings.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #667: [BEAM-455] Adds a test harnesses and utili...

2016-07-15 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/667

[BEAM-455] Adds a test harnesses and utilities framework for sources.

Helper functions and test harnesses for checking correctness of source 
(``iobase.BoundedSource``) and range tracker (``iobase.RangeTracker``) 
implementations.

Contains a few lightweight utilities (e.g. reading items from a source such 
as ``readFromSource()``, as well as heavyweight property testing and stress 
testing harnesses that help getting a large amount of test
coverage with few code.

Most notable ones are:
* ``assertSourcesEqualReferenceSource()`` helps testing that the data read 
by the union of sources   produced by ``BoundedSource.split()`` is the same as 
data read by the original source.
* If your source implements dynamic work rebalancing, use the 
``assertSplitAtFraction()`` family of functions - they test behavior of 
``RangeTracker.try_split()``, in particular, that various consistency 
properties are respected and the total set of data read by the source is 
preserved when splits happen. 
Use ``assertSplitAtFractionBehavior()`` to test individual cases of dynamic 
work rebalancing and use ``assertSplitAtFractionExhaustive()`` as a 
heavy-weight stress test including concurrency.

Updates 'avroio_test.py' to use the test framework.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam testingframework_l

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #667


commit b3ac40ecca86f5952bd5ac3799e55e3e2a1a72a6
Author: Chamikara Jayalath 
Date:   2016-07-15T18:07:48Z

Adds a test harnesses and utilities framework for sources.

Helper functions and test harnesses for checking correctness of source
(``iobase.BoundedSource``) and range tracker (``iobase.RangeTracker``)
implementations.

Contains a few lightweight utilities (e.g. reading items from a source
such as ``readFromSource()``, as well as heavyweight property testing
and stress testing harnesses that help getting a large amount of test
coverage with few code.

Most notable ones are:
* ``assertSourcesEqualReferenceSource()`` helps testing that the data
read by the union of sources produced by ``BoundedSource.split()`` is
the same as data read by the original source.
* If your source implements dynamic work rebalancing, use the
``assertSplitAtFraction()`` family of functions - they test behavior of
``RangeTracker.try_split()``, in particular, that various consistency
properties are respected and the total set of data read by the source
is preserved when splits happen. Use ``assertSplitAtFractionBehavior()``
to test individual cases of ``splitAtFraction()`` and use
``assertSplitAtFractionExhaustive()`` as a heavy-weight stress test
including concurrency.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...

2016-07-15 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/599


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...

2016-07-08 Thread chamikaramj
GitHub user chamikaramj reopened a pull request:

https://github.com/apache/incubator-beam/pull/599

[BEAM-360] Some updates related to dynamic work rebalancing of custom 
sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing results of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #599


commit 19a41ccf5bcf00192e3646258eae0cbce85da23b
Author: Chamikara Jayalath 
Date:   2016-07-07T03:25:04Z

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 4415989ef0dfd656643e6e8575b6e2090b4437b5
Author: Chamikara Jayalath 
Date:   2016-07-07T03:34:21Z

Adds more comments.

commit 6aa697465e88f827a3121a1de8bad1b810d904da
Author: Chamikara Jayalath 
Date:   2016-07-07T04:41:20Z

Some updates related to dynamic work rebalancing custom sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 1e01b1f5cd70e5b39cd064577110898c623e524a
Author: Chamikara Jayalath 
Date:   2016-07-08T19:01:42Z

Reverting some updates.

commit 171df1ecedd51c7c72db309d526dfa9badf1
Author: Chamikara Jayalath 
Date:   2016-07-08T22:34:52Z

Adds a method 'fileio.ChannelFactory.size_in_bytes()'' that can be used to 
determine the size of a single file.

Updates 'filebasedsource' to use this method when determining size of files.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...

2016-07-08 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/599


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #599: [BEAM-360] Some updates related to dynamic...

2016-07-06 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/599

[BEAM-360] Some updates related to dynamic work rebalancing of custom 
sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing results of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #599


commit e51d4acf12133a79671c567c9ff709c941c54f8c
Author: Chamikara Jayalath 
Date:   2016-06-21T01:09:50Z

Implements a framework for developing sources for new file types.

Module 'filebasedsource' provides a framework for  creating sources for new 
file types. This framework readily implements several features common to many 
sources based on files.

Additionally, module 'avroio' contains a new source, 'AvroSource', that is 
implemented using the framework described above. 'AvroSource' is a source for 
reading Avro files.

Adds many unit tests for 'filebasedsource' and 'avroio' modules.

commit cacb613448b47592f8415570f7b64bc6de797f91
Author: Chamikara Jayalath 
Date:   2016-07-07T03:25:04Z

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 264b4afc17c255e568a490e02ce47e9fb4b1e17a
Author: Chamikara Jayalath 
Date:   2016-07-07T03:34:21Z

Adds more comments.

commit 49e097f9c5c3d8c2bca48d3416b4934a4d86ed34
Author: Chamikara Jayalath 
Date:   2016-07-07T04:41:06Z

Some updates related to dynamic work rebalancing custom sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit c9696c9e17c9c7a6fc13d53d4da21ac9b325c73c
Author: Chamikara Jayalath 
Date:   2016-07-07T04:41:20Z

Some updates related to dynamic work rebalancing custom sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #565: [BEAM-393] Adds more code snippets for Pyt...

2016-06-29 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/565

[BEAM-393] Adds more code snippets for Python SDK

Adds code snippets related to following.

(1) Creating and using a new custom source
(2) Creating and using a new custom sink
(3) Performing joins using side inputs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam update_snippets2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #565


commit dfe0af624a43c0b6df23dc078bd5af300ed8848f
Author: Chamikara Jayalath 
Date:   2016-06-25T08:53:12Z

Adds code snippets for custom sources and sinks

commit 9dc3ce0c2d6d8f596a65c47fc64f66daae1a9b87
Author: Chamikara Jayalath 
Date:   2016-06-28T21:01:43Z

Adds code snippets for custom sources and sinks.

commit 647013d9d4393d89b3026e5150ddefd59740e4f8
Author: Chamikara Jayalath 
Date:   2016-06-28T23:28:23Z

Adds code snippets for custom sources and sinks.

commit 312c2c13aac13869851f5fe9f6bca27962f22b5b
Author: Chamikara Jayalath 
Date:   2016-06-28T23:33:22Z

Adds code snippets for custom sources and sinks.

commit 06ef713fe20945c35021d63081104f2dbe6115aa
Author: Chamikara Jayalath 
Date:   2016-06-29T22:30:22Z

Adds code snippets for custom sources and sinks.

commit c5a916940af6176a12380246224320541a0af2b0
Author: Chamikara Jayalath 
Date:   2016-06-29T23:21:02Z

Adds code snippets for custom sources, custom sinks, and joining using side 
inputs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #507: [BEAM-360] Implements a framework for deve...

2016-06-21 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/507


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #507: [BEAM-360] Implements a framework for deve...

2016-06-20 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/507

[BEAM-360] Implements a framework for developing Python SDK sources for new 
file types

Module 'filebasedsource' provides a framework for  creating sources for new 
file types. This framework implements several features common to many sources 
based on files.

Additionally, module 'avroio' contains a new source, 'AvroSource', that is 
implemented using the framework described above. 'AvroSource' is a source for 
reading Avro files.

Adds many unit tests for 'filebasedsource' and 'avroio' modules.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam filebasedsource

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #507


commit e51d4acf12133a79671c567c9ff709c941c54f8c
Author: Chamikara Jayalath 
Date:   2016-06-21T01:09:50Z

Implements a framework for developing sources for new file types.

Module 'filebasedsource' provides a framework for  creating sources for new 
file types. This framework readily implements several features common to many 
sources based on files.

Additionally, module 'avroio' contains a new source, 'AvroSource', that is 
implemented using the framework described above. 'AvroSource' is a source for 
reading Avro files.

Adds many unit tests for 'filebasedsource' and 'avroio' modules.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---