[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368706#comment-15368706
 ] 

ASF GitHub Bot commented on BEAM-360:
-------------------------------------

GitHub user chamikaramj reopened a pull request:

    https://github.com/apache/incubator-beam/pull/599

    [BEAM-360] Some updates related to dynamic work rebalancing of custom 
sources.

    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing results of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #599
    
----
commit 19a41ccf5bcf00192e3646258eae0cbce85da23b
Author: Chamikara Jayalath <chamik...@apache.org>
Date:   2016-07-07T03:25:04Z

    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 4415989ef0dfd656643e6e8575b6e2090b4437b5
Author: Chamikara Jayalath <chamik...@apache.org>
Date:   2016-07-07T03:34:21Z

    Adds more comments.

commit 6aa697465e88f827a3121a1de8bad1b810d904da
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2016-07-07T04:41:20Z

    Some updates related to dynamic work rebalancing custom sources.
    
    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 1e01b1f5cd70e5b39cd064577110898c623e524a
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2016-07-08T19:01:42Z

    Reverting some updates.

commit 171df1ec3333edd51c7c72db309d526dfa9badf1
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2016-07-08T22:34:52Z

    Adds a method 'fileio.ChannelFactory.size_in_bytes()'' that can be used to 
determine the size of a single file.
    
    Updates 'filebasedsource' to use this method when determining size of files.

----


> Add a framework for creating Python-SDK sources for new file types
> ------------------------------------------------------------------
>
>                 Key: BEAM-360
>                 URL: https://issues.apache.org/jira/browse/BEAM-360
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to