Huge +1.

This brings things more in line with Python's FileBasedSink where one
simply overrides write[_encoded]_record and, usually, open/close. We
may want to consider aligning the APIs. (And, of course bringing
things like DynamicDestinations to Python.)

On Wed, Sep 6, 2017 at 9:24 PM, Jean-Baptiste Onofré <[email protected]> wrote:
> Fantastic.
>
> Big +1 for this.
>
> Regards
> JB
>
>
> On 09/07/2017 03:44 AM, Eugene Kirpichov wrote:
>>
>> Hi,
>>
>> Please take a look at the following proposal.
>>
>> I believe, together with the (already available) FileIO.match() and
>> FileIO.readMatches() this proposal will empower Beam users to address all
>> use cases of file-based IO I'm aware of - which makes me quite excited.
>>
>> http://s.apache.org/fileio-write
>>
>> *We propose a new API for writing files in Beam: FileIO.write(). It is
>> more
>> modular and cleaner to code against than FileBasedSink, and aims to
>> completely replace it.*
>>
>> *FileIO.write() lets an IO author implement only logic and configuration
>> specific to a particular file format (e.g. Avro) and automatically get all
>> format-agnostic features, such as sharding, cleanup, windowed writes,
>> DynamicDestinations, compression, returning the successfully written
>> filenames, etc.*
>>
>> TL;DR:
>>
>> FileIO.write(FileSink<DestT, InputT> { open(dest), write(input), close()
>> })
>>        .to(input → dest)
>>        .withFilenamePolicy(dest → prefix, shard pattern)
>>        .withEverythingElse() // like in WriteFiles
>>
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Reply via email to