Hello all, It was nice to meet you last week!!!
I am writing genomic pCollection that is created from bigQuery to a folder. Following is the code with output so you can run it with any small BQ table and let me know what your thoughts are: rows = [{u'index': u'GSM2313641', u'SNRPCP14': 0},{u'index': u'GSM2316666', u'SNRPCP14': 0},{u'index': u'GSM2312355', u'SNRPCP14': 0},{u'index': u'GSM2312372', u'SNRPCP14': 0}] rows[1].keys() # output: [u'index', u'SNRPCP14'] # you can change `archs4.results_20180308_ to any other table name with index column queries2 = rows | beam.Map(lambda x: (beam.io.Read(beam.io.BigQuerySource(project='orielresearch-188115', use_standard_sql=False, query=str('SELECT * FROM `archs4.results_20180308_*` where index=\'%s\'' % (x["index"])))), str('gs://archs4/output/'+x["index"]+'/'))) queries2 # output: a list of pCollection and the path to write the pCollection data to [(<Read(PTransform) label=[Read] at 0x7fa6990fb7d0>, 'gs://archs4/output/GSM2313641/'), (<Read(PTransform) label=[Read] at 0x7fa6990fb950>, 'gs://archs4/output/GSM2316666/'), (<Read(PTransform) label=[Read] at 0x7fa6990fb9d0>, 'gs://archs4/output/GSM2312355/'), (<Read(PTransform) label=[Read] at 0x7fa6990fbb50>, 'gs://archs4/output/GSM2312372/')] *# this is my challenge* queries2 | 'write to relevant path' >> beam.io.WriteToText("SECOND COLUMN") Do you have any idea how to sink the data to a text file? I have tried few other options and was stuck at the write transform Any advice is very appreciated. Thanks, Eila -- Eila www.orielresearch.org https://www.meetup.com/Deep-Learning-In-Production/