[ 
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650355#comment-16650355
 ] 

Asha Rostamianfar commented on BEAM-5628:
-----------------------------------------

Would it be ok if we just delete vcfio.py? I don't think anyone is actually 
using it, and we haven't provided any documentation about how to use it anyway. 
I suppose if anyone is using it, they can pin to an older version of Beam until 
we release an updated version or use our implementation inside [Variant 
Transforms|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py].

Context: our original goal was to move [vcfio.py from Variant 
Transforms|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py]
 to the Beam SDK so that the wider community can use it as well (we'd delete 
that code on our end). This is still our goal, but we are planning to make 
significant changes to vcfio (including switching the parser from PyVCF to 
[Nucleus|https://github.com/google/nucleus] as it's a more supported parser 
recently developed by Google Brain). Given the new issue, it may be easier to 
just delete this transform and add it back once our transition to Nucleus has 
been completed.

I can send a PR to delete the transform and its PyVCF dependency.

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string 
> pattern on a bytes-like object
> --------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-5628
>                 URL: https://issues.apache.org/jira/browse/BEAM-5628
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Simon
>            Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
>  line 336, in test_read_after_splitting
> ]     split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
>  line 101, in read_from_source
>      for value in reader:
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 264, in read_records
>      for line in record_iterator:
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 330, in __next__
>      record = next(self._vcf_reader)
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
>  line 543, in __next__
>      row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to