I really like the idea of generalizing TextIO to be able to read a file header while still reading the rest of the contents in parallel. People have long been asking for this for CSV. If we add that, a special VcfIO will not be necessary because you'll be able to just use the enhanced TextIO and parse VCF from the lines and headers (granted, it still makes sense to have this as a library, just not necessarily packaged as a PTransform).
On Thu, Aug 17, 2017, 10:35 AM Reuven Lax <[email protected]> wrote: > I think this approach should not be that hard. We need to see if some of > the code in TextSource needs to be refactored, as TextSource is currently > package private. > > On Wed, Aug 16, 2017 at 12:04 PM, Chamikara Jayalath <[email protected] > > > wrote: > > > Thanks for proposing this. > > > > I left some comments. My main concern is the possible complexity this > might > > add to textio and potential performance impact. So at this point I prefer > > if this is implemented as a new filebasedsource instead of updating > textio. > > I'm open to being convinced otherwise :). > > > > Thanks, > > Cham > > > > On Wed, Aug 16, 2017 at 11:01 AM Eugene Kirpichov > > <[email protected]> wrote: > > > > > +Chamikara Jayalath <[email protected]> > > > Also you may find useful the recent discussion on WholeFileIO > > > > > > https://lists.apache.org/thread.html/6ea193b7178f8ab44de5562bfdd94d > > c3fe740bc440e8a05e533e40cf@%3Cdev.beam.apache.org%3E > > > https://github.com/apache/beam/pull/3543 (I think bulk of discussion > > > happened there) > > > https://github.com/apache/beam/pull/3717 > > > > > > > > > On Wed, Aug 16, 2017 at 10:58 AM Jean-Baptiste Onofré <[email protected] > > > > > wrote: > > > > > > > I will thanks ! > > > > > > > > Regards > > > > JB > > > > > > > > On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar > > > > <[email protected]> wrote: > > > > >Hi everyone, > > > > > > > > > >I have a proposal to add a new built-in I/O source for VCF files: > > > > > > > > > > > > https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyR > > SJrcGbEDpY9Lkw/edit > > > > > > > > > >I'm planning to take on the implementation work myself, but wanted > to > > > > >get > > > > >preliminary feedback about the proposed design as it requires making > > > > >changes to the existing TextIO. I will file a JIRA FR as well. > > > > > > > > > >Please take a look at the doc and feel free to comment. > > > > > > > > > >Thanks, > > > > >Asha > > > > > > > > > >
