[ https://issues.apache.org/jira/browse/BEAM-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Valentyn Tymofieiev resolved BEAM-5844. --------------------------------------- Fix Version/s: Not applicable Resolution: Won't Fix > Transition VCF IO to use Nucleus > -------------------------------- > > Key: BEAM-5844 > URL: https://issues.apache.org/jira/browse/BEAM-5844 > Project: Beam > Issue Type: Task > Components: sdk-py-core > Reporter: Asha Rostamianfar > Assignee: Asha Rostamianfar > Priority: P3 > Labels: stale-assigned > Fix For: Not applicable > > > Currently, vcfio.py uses [PyVCF|https://github.com/jamescasbon/PyVCF] as its > parser. Even though it's one of the popular VCF parsers, it is not actively > maintained. There are also python3 compatibility issues (see BEAM-5628). > There is a new FOSS parser from the Google Brain team, called > [Nucleus|https://github.com/google/nucleus], that we can use instead. It has > other nice features like built-in protocol buffer support so that we no > longer need to transform the internal structures into Variant objects (we can > deprecate the existing Variant/VariantCall classes in favor of using the > protos). > The Google Cloud Healthcare & Life Sciences team is planning to switch to > using Nucleus as its parser for the [Variant > Transforms|https://github.com/googlegenomics/gcp-variant-transforms] tool. > Once that is done, we'll sync the [vcfio.py > code|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py] > back to the Beam SDK so that the wider community can use it as well > (potentially with additional features, like ReadAllFromVCF and VCF sink). -- This message was sent by Atlassian Jira (v8.3.4#803005)