Re: [PR] Implemented MLTransform generate vocab Dataflow benchmark [beam]

via GitHub Fri, 08 May 2026 08:12:36 -0700


aIbrahiim commented on code in PR #38215:
URL: https://github.com/apache/beam/pull/38215#discussion_r3209566603



##########
sdks/python/apache_beam/examples/ml_transform/mltransform_generate_vocab.py:
##########
@@ -229,7 +229,7 @@ def run(argv=None, test_pipeline=None):
               vocab_filename='vocab'))
       | 'ExtractTransformedTokens' >> beam.Map(lambda row: row.text)
       | 'FlattenTokens' >> beam.FlatMap(list)
-      | 'DropEmptyTokens' >> beam.Filter(bool))
+      | 'DropEmptyTokens' >> beam.Filter(lambda token: token is not None))

Review Comment:
   ahhh you are right, token indices from ComputeAndApplyVocabulary should 
never be None i will modify it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implemented MLTransform generate vocab Dataflow benchmark [beam]

Reply via email to