AnandInguva commented on issue #29037: URL: https://github.com/apache/beam/issues/29037#issuecomment-1766872317
Okay. I have tested your code. When you pass `extra_package` in a dict like you did, the dataflow runner launches the job with `extra_package=tensorflow` but this gets ignored on the dataflow worker since it expects a tar ball. Your code worked on 2.50.0 because even though you were passing a string to `extra_package`, tensorflow is already installed in the Apache Beam containers. For 2.51.0, tensorflow is not installed, hence you get `ImportError`. Passing a string(for ex, module name) to `--extra_package` is a bug and we can solve this in `2.52.0`. If you pass the `extra_package` as `options.view_as(SetupOptions).extra_packages = tensorflow`, it will throw RuntimeError but the pattern you mentioned above doesn't. To solve your concern, you can use `requirements_file` pipeline option and pass a `requirements.txt` with tensorflow in it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
