[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368706#comment-15368706 ]
ASF GitHub Bot commented on BEAM-360: ------------------------------------- GitHub user chamikaramj reopened a pull request: https://github.com/apache/incubator-beam/pull/599 [BEAM-360] Some updates related to dynamic work rebalancing of custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #599 ---- commit 19a41ccf5bcf00192e3646258eae0cbce85da23b Author: Chamikara Jayalath <chamik...@apache.org> Date: 2016-07-07T03:25:04Z Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 4415989ef0dfd656643e6e8575b6e2090b4437b5 Author: Chamikara Jayalath <chamik...@apache.org> Date: 2016-07-07T03:34:21Z Adds more comments. commit 6aa697465e88f827a3121a1de8bad1b810d904da Author: Chamikara Jayalath <chamik...@google.com> Date: 2016-07-07T04:41:20Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 1e01b1f5cd70e5b39cd064577110898c623e524a Author: Chamikara Jayalath <chamik...@google.com> Date: 2016-07-08T19:01:42Z Reverting some updates. commit 171df1ec3333edd51c7c72db309d526dfa9badf1 Author: Chamikara Jayalath <chamik...@google.com> Date: 2016-07-08T22:34:52Z Adds a method 'fileio.ChannelFactory.size_in_bytes()'' that can be used to determine the size of a single file. Updates 'filebasedsource' to use this method when determining size of files. ---- > Add a framework for creating Python-SDK sources for new file types > ------------------------------------------------------------------ > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py > Reporter: Chamikara Jayalath > Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)