+1 for generalizing header passing logic of TextIO to support other formats such as VCF and CSV. I think it'll still be useful to have VcfIO though that reads header+lines for a bundle and produces a PCollection of VCF record protos.
Thanks, Cham On Thu, Aug 17, 2017 at 11:23 AM Eugene Kirpichov <[email protected]> wrote: > I really like the idea of generalizing TextIO to be able to read a file > header while still reading the rest of the contents in parallel. People > have long been asking for this for CSV. If we add that, a special VcfIO > will not be necessary because you'll be able to just use the enhanced > TextIO and parse VCF from the lines and headers (granted, it still makes > sense to have this as a library, just not necessarily packaged as a > PTransform). > > On Thu, Aug 17, 2017, 10:35 AM Reuven Lax <[email protected]> > wrote: > > > I think this approach should not be that hard. We need to see if some of > > the code in TextSource needs to be refactored, as TextSource is currently > > package private. > > > > On Wed, Aug 16, 2017 at 12:04 PM, Chamikara Jayalath < > [email protected] > > > > > wrote: > > > > > Thanks for proposing this. > > > > > > I left some comments. My main concern is the possible complexity this > > might > > > add to textio and potential performance impact. So at this point I > prefer > > > if this is implemented as a new filebasedsource instead of updating > > textio. > > > I'm open to being convinced otherwise :). > > > > > > Thanks, > > > Cham > > > > > > On Wed, Aug 16, 2017 at 11:01 AM Eugene Kirpichov > > > <[email protected]> wrote: > > > > > > > +Chamikara Jayalath <[email protected]> > > > > Also you may find useful the recent discussion on WholeFileIO > > > > > > > > https://lists.apache.org/thread.html/6ea193b7178f8ab44de5562bfdd94d > > > c3fe740bc440e8a05e533e40cf@%3Cdev.beam.apache.org%3E > > > > https://github.com/apache/beam/pull/3543 (I think bulk of discussion > > > > happened there) > > > > https://github.com/apache/beam/pull/3717 > > > > > > > > > > > > On Wed, Aug 16, 2017 at 10:58 AM Jean-Baptiste Onofré < > [email protected] > > > > > > > wrote: > > > > > > > > > I will thanks ! > > > > > > > > > > Regards > > > > > JB > > > > > > > > > > On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar > > > > > <[email protected]> wrote: > > > > > >Hi everyone, > > > > > > > > > > > >I have a proposal to add a new built-in I/O source for VCF files: > > > > > > > > > > > > > > > https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyR > > > SJrcGbEDpY9Lkw/edit > > > > > > > > > > > >I'm planning to take on the implementation work myself, but wanted > > to > > > > > >get > > > > > >preliminary feedback about the proposed design as it requires > making > > > > > >changes to the existing TextIO. I will file a JIRA FR as well. > > > > > > > > > > > >Please take a look at the doc and feel free to comment. > > > > > > > > > > > >Thanks, > > > > > >Asha > > > > > > > > > > > > > > >
