Re: Proposal: file-based IOs should support readAllMatches()

2017-09-03 Thread Eugene Kirpichov
The PR is in. Now you can write code like the following, to use XmlIO to watch for new files even though XmlIO itself does not support this. PCollection files = p .apply(FileIO.match().filepattern(options.getInputFilepatternProvider()).continuously( Duration.standardSeconds(30), af

Re: Proposal: file-based IOs should support readAllMatches()

2017-08-31 Thread Eugene Kirpichov
I sent a PR about this all: https://github.com/apache/beam/pull/3799 On Mon, Aug 28, 2017 at 8:45 AM Eugene Kirpichov wrote: > Thanks. I think I agree that file-based IOs (at least widely used ones) > should, for convenience, still provide FooIO.read().from(filepattern), and > for performance un

Re: Proposal: file-based IOs should support readAllMatches()

2017-08-28 Thread Eugene Kirpichov
Thanks. I think I agree that file-based IOs (at least widely used ones) should, for convenience, still provide FooIO.read().from(filepattern), and for performance until SDF has full support in all runners, implement it via a BoundedSource. The second case with Create.of(filepattern) illustrates wh

Re: Proposal: file-based IOs should support readAllMatches()

2017-08-28 Thread Etienne Chauchot
Hi Eugene, +1 to this, it is nice to add this common behavior to all the file-based IOs. I find the design elegant, I just have one minor API comment, I would prefer p.apply(FooIO.read().from(filepattern)) to p.apply(Create.of(filepattern)) IMHO, it is more readable and analogous to the ot

Re: Proposal: file-based IOs should support readAllMatches()

2017-08-25 Thread Eugene Kirpichov
I think I have a somewhat better proposal that encompasses this and WholeFileIO. I'm already moving Match.filepatterns() into FileIO.match()/matchAll(), and I'd like to create FileIO.read(): PCollection -> PCollection, potentially configurable by .withCompression(). Here, ReadableFile will be a n

Re: Proposal: file-based IOs should support readAllMatches()

2017-08-18 Thread Chamikara Jayalath
+1 for this. Also it looks like IO authors should be able to use existing 'ReadAllViaFileBasedSource' transform when implementing FooIO.readAllMatches(). - Cham On Fri, Aug 18, 2017 at 2:38 PM Eugene Kirpichov wrote: > Hi all, > > I've been adding new features to TextIO and AvroIO recently, see

Proposal: file-based IOs should support readAllMatches()

2017-08-18 Thread Eugene Kirpichov
Hi all, I've been adding new features to TextIO and AvroIO recently, see e.g. https://github.com/apache/beam/pull/3725. The features are: - withHintMatchesManyFiles() - readAll() that reads a PCollection of filepatterns - configurable treatment of filepatterns that match no files - watchForNewFile