PR out for review https://github.com/apache/beam/pull/3817
Next steps are clean it up (in this PR) and implement sinks for Text, XML and TFRecord (in subsequent PRs). On Thu, Sep 7, 2017 at 9:57 AM Robert Bradshaw <[email protected]> wrote: > Huge +1. > > This brings things more in line with Python's FileBasedSink where one > simply overrides write[_encoded]_record and, usually, open/close. We > may want to consider aligning the APIs. (And, of course bringing > things like DynamicDestinations to Python.) > > On Wed, Sep 6, 2017 at 9:24 PM, Jean-Baptiste Onofré <[email protected]> > wrote: > > Fantastic. > > > > Big +1 for this. > > > > Regards > > JB > > > > > > On 09/07/2017 03:44 AM, Eugene Kirpichov wrote: > >> > >> Hi, > >> > >> Please take a look at the following proposal. > >> > >> I believe, together with the (already available) FileIO.match() and > >> FileIO.readMatches() this proposal will empower Beam users to address > all > >> use cases of file-based IO I'm aware of - which makes me quite excited. > >> > >> http://s.apache.org/fileio-write > >> > >> *We propose a new API for writing files in Beam: FileIO.write(). It is > >> more > >> modular and cleaner to code against than FileBasedSink, and aims to > >> completely replace it.* > >> > >> *FileIO.write() lets an IO author implement only logic and configuration > >> specific to a particular file format (e.g. Avro) and automatically get > all > >> format-agnostic features, such as sharding, cleanup, windowed writes, > >> DynamicDestinations, compression, returning the successfully written > >> filenames, etc.* > >> > >> TL;DR: > >> > >> FileIO.write(FileSink<DestT, InputT> { open(dest), write(input), close() > >> }) > >> .to(input → dest) > >> .withFilenamePolicy(dest → prefix, shard pattern) > >> .withEverythingElse() // like in WriteFiles > >> > > > > -- > > Jean-Baptiste Onofré > > [email protected] > > http://blog.nanthrax.net > > Talend - http://www.talend.com >
