[ 
https://issues.apache.org/jira/browse/BEAM-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-5844.
---------------------------------------
    Fix Version/s: Not applicable
       Resolution: Won't Fix

> Transition VCF IO to use Nucleus
> --------------------------------
>
>                 Key: BEAM-5844
>                 URL: https://issues.apache.org/jira/browse/BEAM-5844
>             Project: Beam
>          Issue Type: Task
>          Components: sdk-py-core
>            Reporter: Asha Rostamianfar
>            Assignee: Asha Rostamianfar
>            Priority: P3
>              Labels: stale-assigned
>             Fix For: Not applicable
>
>
> Currently, vcfio.py uses [PyVCF|https://github.com/jamescasbon/PyVCF] as its 
> parser. Even though it's one of the popular VCF parsers, it is not actively 
> maintained. There are also python3 compatibility issues (see BEAM-5628). 
> There is a new FOSS parser from the Google Brain team, called 
> [Nucleus|https://github.com/google/nucleus], that we can use instead. It has 
> other nice features like built-in protocol buffer support so that we no 
> longer need to transform the internal structures into Variant objects (we can 
> deprecate the existing Variant/VariantCall classes in favor of using the 
> protos).
> The Google Cloud Healthcare & Life Sciences team is planning to switch to 
> using Nucleus as its parser for the [Variant 
> Transforms|https://github.com/googlegenomics/gcp-variant-transforms] tool. 
> Once that is done, we'll sync the [vcfio.py 
> code|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py]
>  back to the Beam SDK so that the wider community can use it as well 
> (potentially with additional features, like ReadAllFromVCF and VCF sink).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to