Chamikara Jayalath created BEAM-360:
---------------------------------------

             Summary: Add a framework for creating Python-SDK sources for new 
file types
                 Key: BEAM-360
                 URL: https://issues.apache.org/jira/browse/BEAM-360
             Project: Beam
          Issue Type: New Feature
          Components: sdk-py
            Reporter: Chamikara Jayalath
            Assignee: Chamikara Jayalath


We already have a framework for creating new sources for Beam Python SDK - 
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326

It would be great if we can add a framework on top of this that encapsulates 
logic common to sources that are based on files. This framework can include 
following features that are common to sources based on files.
(1) glob expansion
(2) support for new file-systems
(3) dynamic work rebalancing based on byte offsets
(4) support for reading compressed files.

Java SDK has a similar framework and it's available at - 
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to