tvalentyn commented on a change in pull request #15900:
URL: https://github.com/apache/beam/pull/15900#discussion_r747018962



##########
File path: sdks/python/apache_beam/examples/fastavro_it_test.py
##########
@@ -135,29 +146,9 @@ def batch_indices(start):
     assert result.state == PipelineState.DONE
 
     with TestPipeline(is_integration_test=True) as fastavro_read_pipeline:
-
-      fastavro_records = \
-          fastavro_read_pipeline \
-          | 'create-fastavro' >> Create(['%s*' % fastavro_output]) \
-          | 'read-fastavro' >> ReadAllFromAvro() \
-          | Map(lambda rec: (rec['number'], rec))
-
-      def check(elem):
-        v = elem[1]
-
-        def assertEqual(l, r):
-          if l != r:
-            raise BeamAssertException('Assertion failed: %s == %s' % (l, r))
-
-        assertEqual(sorted(v.keys()), ['fastavro'])
-        fastavro_values = v['fastavro']
-        assertEqual(len(fastavro_values), 1)
-
-      # pylint: disable=expression-not-assigned
-      {
-          'fastavro': fastavro_records
-      } \
-      | CoGroupByKey() \
+      fastavro_read_pipeline \
+      | 'create-fastavro' >> Create(['%s*' % fastavro_output]) \
+      | 'read-fastavro' >> ReadAllFromAvro() \

Review comment:
       Can we also compare the values for the keys, to make sure that no values 
were not lost during write-read operation? 
   I think it could be accomplished by running co-GBK of a pcollection  coming 
form `     | 'read-fastavro' >> ReadAllFromAvro() \`, and  pcollection of 
generated data. Then, we can extract the set of elements tagged with first 
pcollection, and the second pcollection, and verify that these sets are the 
same for all elements in GBK output.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to